2019-09-11 11:26:28 -06:00
---
date: "2019-09-06T01:35:00-03:00"
title: "Repository indexer"
slug: "repo-indexer"
2023-07-25 22:53:13 -06:00
sidebar_position: 45
2020-12-08 23:47:06 -07:00
toc: false
2019-09-11 11:26:28 -06:00
draft: false
Refactor docs (#23752)
This was intended to be a small followup for
https://github.com/go-gitea/gitea/pull/23712, but...here we are.
1. Our docs currently use `slug` as the entire URL, which makes
refactoring tricky (see https://github.com/go-gitea/gitea/pull/23712).
Instead, this PR attempts to make future refactoring easier by using
slugs as an extension of the section. (Hugo terminology)
- What the above boils down to is this PR attempts to use directory
organization as URL management. e.g. `usage/comparison.en-us.md` ->
`en-us/usage/comparison/`, `usage/packages/overview.en-us.md` ->
`en-us/usage/packages/overview/`
- Technically we could even remove `slug`, as Hugo defaults to using
filename, however at least with this PR it means `slug` only needs to be
the name for the **current file** rather than an entire URL
2. This PR adds appropriate aliases (redirects) for pages, so anything
on the internet that links to our docs should hopefully not break.
3. A minor nit I've had for a while, renaming `seek-help` to `support`.
It's a minor thing, but `seek-help` has a strange connotation to it.
4. The commits are split such that you can review the first which is the
"actual" change, and the second is added redirects so that the first
doesn't break links elsewhere.
---------
Signed-off-by: jolheiser <john.olheiser@gmail.com>
2023-04-27 21:33:41 -06:00
aliases:
- /en-us/repo-indexer
2019-09-11 11:26:28 -06:00
menu:
sidebar:
2023-03-23 09:18:24 -06:00
parent: "administration"
2019-09-11 11:26:28 -06:00
name: "Repository indexer"
2023-07-25 22:53:13 -06:00
sidebar_position: 45
2019-09-11 11:26:28 -06:00
identifier: "repo-indexer"
---
# Repository indexer
2024-03-24 10:05:00 -06:00
## Builtin repository code search without indexer
Users could do repository-level code search without setting up a repository indexer.
The builtin code search is based on the `git grep` command, which is fast and efficient for small repositories.
Better code search support could be achieved by setting up the repository indexer.
2019-09-11 11:26:28 -06:00
## Setting up the repository indexer
2023-08-27 05:59:12 -06:00
Gitea can search through the files of the repositories by enabling this function in your [`app.ini` ](administration/config-cheat-sheet.md ):
2019-09-11 11:26:28 -06:00
2020-12-08 23:47:06 -07:00
```ini
2019-09-11 11:26:28 -06:00
[indexer]
; ...
REPO_INDEXER_ENABLED = true
REPO_INDEXER_PATH = indexers/repos.bleve
MAX_FILE_SIZE = 1048576
REPO_INDEXER_INCLUDE =
REPO_INDEXER_EXCLUDE = resources/bin/**
```
Please bear in mind that indexing the contents can consume a lot of system resources, especially when the index is created for the first time or globally updated (e.g. after upgrading Gitea).
### Choosing the files for indexing by size
The `MAX_FILE_SIZE` option will make the indexer skip all files larger than the specified value.
### Choosing the files for indexing by path
Gitea applies glob pattern matching from the [`gobwas/glob` library ](https://github.com/gobwas/glob ) to choose which files will be included in the index.
Limiting the list of files prevents the indexes from becoming polluted with derived or irrelevant files (e.g. lss, sym, map, etc.), so the search results are more relevant. It can also help reduce the index size.
2020-02-20 12:53:55 -07:00
`REPO_INDEXER_EXCLUDE_VENDORED` (default: true) excludes vendored files from index.
2019-09-11 11:26:28 -06:00
`REPO_INDEXER_INCLUDE` (default: empty) is a comma separated list of glob patterns to **include** in the index. An empty list means "_include all files_".
`REPO_INDEXER_EXCLUDE` (default: empty) is a comma separated list of glob patterns to **exclude** from the index. Files that match this list will not be indexed. `REPO_INDEXER_EXCLUDE` takes precedence over `REPO_INDEXER_INCLUDE` .
Pattern matching works as follows:
2020-12-08 23:47:06 -07:00
- To match all files with a `.txt` extension no matter what directory, use `**.txt` .
- To match all files with a `.txt` extension _only at the root level of the repository_ , use `*.txt` .
- To match all files inside `resources/bin` and below, use `resources/bin/**` .
- To match all files _immediately inside_ `resources/bin` , use `resources/bin/*` .
- To match all files named `Makefile` , use `**Makefile` .
- Matching a directory has no effect; the pattern `resources/bin` will not include/exclude files inside that directory; `resources/bin/**` will.
- All files and patterns are normalized to lower case, so `**Makefile` , `**makefile` and `**MAKEFILE` are equivalent.