Commit Graph

199 Commits

Author SHA1 Message Date
Bruno Sofiato f64fbd9b74
Updated tokenizer to better matching when search for code snippets (#32261)
This PR improves the accuracy of Gitea's code search. 

Currently, Gitea does not consider statements such as
`onsole.log("hello")` as hits when the user searches for `log`. The
culprit is how both ES and Bleve are tokenizing the file contents (in
both cases, `console.log` is a whole token).

In ES' case, we changed the tokenizer to
[simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.).
In such a case, tokens are words formed by digits and letters. In
Bleve's case, it employs a
[letter](https://blevesearch.com/docs/Tokenizers/) tokenizer.

Resolves #32220

---------

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
2024-11-06 20:51:20 +00:00
wxiaoguang 5e6523aa57
Update go dependencies (#32389) 2024-10-31 12:05:54 +00:00
Bruno Sofiato 900ac62251
Allow code search by filename (#32210)
This is a large and complex PR, so let me explain in detail its changes.

First, I had to create new index mappings for Bleve and ElasticSerach as
the current ones do not support search by filename. This requires Gitea
to recreate the code search indexes (I do not know if this is a breaking
change, but I feel it deserves a heads-up).

I've used [this
approach](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-pathhierarchy-tokenizer.html)
to model the filename index. It allows us to efficiently search for both
the full path and the name of a file. Bleve, however, does not support
this out-of-box, so I had to code a brand new [token
filter](https://blevesearch.com/docs/Token-Filters/) to generate the
search terms.

I also did an overhaul in the `indexer_test.go` file. It now asserts the
order of the expected results (this is important since matches based on
the name of a file are more relevant than those based on its content).
I've added new test scenarios that deal with searching by filename. They
use a new repo included in the Gitea fixture.

The screenshot below depicts how Gitea shows the search results. It
shows results based on content in the same way as the current version
does. In matches based on the filename, the first seven lines of the
file contents are shown (BTW, this is how GitHub does it).


![image](https://github.com/user-attachments/assets/9d938d86-1a8d-4f89-8644-1921a473e858)

Resolves #32096

---------

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
2024-10-11 23:35:04 +00:00
Bruno Sofiato d266d190bd
Fixed race condition when deleting documents by repoId in ElasticSearch (#32185)
Resolves #32184

---------

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
2024-10-03 16:03:36 +00:00
Bruno Sofiato 99d0510cb6
Change the code search to sort results by relevance (#32134)
Resolves #32129

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
2024-09-28 20:13:55 +00:00
Lunny Xiao 286ede47ad
Fix index too many file names bug (#31903)
Try to fix #31884
Fix #28584
2024-09-01 05:57:31 +00:00
Lunny Xiao c03baab678
Refactor the usage of batch catfile (#31754)
When opening a repository, it will call `ensureValidRepository` and also
`CatFileBatch`. But sometimes these will not be used until repository
closed. So it's a waste of CPU to invoke 3 times git command for every
open repository.

This PR removed all of these from `OpenRepository` but only kept
checking whether the folder exists. When a batch is necessary, the
necessary functions will be invoked.
2024-08-20 17:04:57 +00:00
Kemal Zebari c0b5a843ba
Properly filter issue list given no assignees filter (#31522)
Quick fix #31520. This issue is related to #31337.
2024-07-23 18:36:32 +00:00
Carsten Klein 3571b7e3dd
Allow searching issues by ID (#31479)
When you are entering a number in the issue search, you likely want the
issue with the given ID (code internal concept: issue index).
As such, when a number is detected, the issue with the corresponding ID
will now be added to the results.

Fixes #4479

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
2024-07-17 00:49:05 +02:00
Lunny Xiao e4abaff7ff
Fix bug filtering issues which have no project (#31337)
Fix #31327
This is a quick patch to fix the bug.
Some parameters are using 0, some are using -1. I think it needs a
refactor to keep consistent. But that will be another PR.
2024-06-14 02:31:07 +00:00
Lunny Xiao 98751108b1
Rename project board -> column to make the UI less confusing (#30170)
This PR split the `Board` into two parts. One is the struct has been
renamed to `Column` and the second we have a `Template Type`.

But to make it easier to review, this PR will not change the database
schemas, they are just renames. The database schema changes could be in
future PRs.

---------

Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: yp05327 <576951401@qq.com>
2024-05-27 08:59:54 +00:00
wxiaoguang 6f7cd94a02
Fix bleve fuzziness (#30799)
Fix #30797
Fix #30317
2024-05-01 15:32:52 +03:00
silverwind 610802df85
Fix tautological conditions (#30735)
As discovered by https://github.com/go-gitea/gitea/pull/30729.

---------

Co-authored-by: Giteabot <teabot@gitea.io>
2024-04-30 14:34:40 +02:00
Chongyi Zheng e80466f734
Resolve lint for unused parameter and unnecessary type arguments (#30750)
Resolve all cases for `unused parameter` and `unnecessary type
arguments`

Related: #30729

---------

Co-authored-by: Giteabot <teabot@gitea.io>
2024-04-29 08:47:56 +00:00
Kemal Zebari 9b7af4340c
Perform Newest sort type correctly when sorting issues (#30644)
Should resolve #30642.

Before this commit, we were treating an empty `?sort=` query parameter
as the correct sorting type (which is to sort issues in descending order
by their created UNIX time). But when we perform `sort=latest`, we did
not include this as a type so we would sort by the most recently updated
when reaching the `default` switch statement block.

This commit fixes this by considering the empty string, "latest", and
just any other string that is not mentioned in the switch statement as
sorting by newest.
2024-04-23 15:10:01 +08:00
silverwind 74f0c84fa4
Enable more `revive` linter rules (#30608)
Noteable additions:

- `redefines-builtin-id` forbid variable names that shadow go builtins
- `empty-lines` remove unnecessary empty lines that `gofumpt` does not
remove for some reason
- `superfluous-else` eliminate more superfluous `else` branches

Rules are also sorted alphabetically and I cleaned up various parts of
`.golangci.yml`.
2024-04-22 11:48:42 +00:00
wxiaoguang ca5c895efb
Render embedded code preview by permlink in markdown (#30234)
The permlink in markdown will be rendered as a code preview block, like GitHub

Co-authored-by: silverwind <me@silverwind.io>
2024-04-02 17:48:27 +00:00
Lunny Xiao 3f26fe2fa2
Use db.ListOptions directly instead of Paginator interface to make it easier to use and fix performance of /pulls and /issues (#29990)
This PR uses `db.ListOptions` instead of `Paginor` to make the code
simpler.
And it also fixed the performance problem when viewing /pulls or
/issues. Before the counting in fact will also do the search.

---------

Co-authored-by: Jason Song <i@wolfogre.com>
Co-authored-by: silverwind <me@silverwind.io>
2024-03-24 18:51:08 +00:00
wxiaoguang 4734d43e14
Support repo code search without setting up an indexer (#29998)
By using git's ability, end users (especially small instance users) do
not need to enable the indexer, they could also benefit from the code
searching feature.

Fix #29996


![image](https://github.com/go-gitea/gitea/assets/2114189/11b7e458-88a4-480d-b4d7-72ee59406dd1)


![image](https://github.com/go-gitea/gitea/assets/2114189/0fe777d5-c95c-4288-a818-0427680805b6)

---------

Co-authored-by: silverwind <me@silverwind.io>
2024-03-24 17:05:00 +01:00
6543 b9c57fb78e
Determine fuzziness of bleve indexer by keyword length (#29706)
also bleve did match on fuzzy search and the other way around. this also fix that bug.
2024-03-23 16:45:13 +01:00
Lunny Xiao f8ab9dafb7
Use db.ListOptionsAll instead of db.ListOptions{ListAll: true} (#29995) 2024-03-22 13:53:52 +01:00
6543 c6e5ec51bd
Meilisearch double quote on "match" query (#29740)
make `nonFuzzyWorkaround` unessesary

cc @Kerollmops
2024-03-16 13:19:41 +00:00
6543 1262ff6734
Refactor code_indexer to use an SearchOptions struct for PerformSearch (#29724)
similar to how it's already done for the issue_indexer


---
*Sponsored by Kithara Software GmbH*
2024-03-16 10:32:45 +00:00
6543 7fd0a5b276
Refactor to use optional.Option for issue index search option (#29739)
Signed-off-by: 6543 <6543@obermui.de>
2024-03-13 08:25:53 +00:00
Lunny Xiao 3c6fc25a77
Use repo object format name instead of detecting from git repository (#29702)
It's unnecessary to detect the repository object format from git
repository. Just use the repository's object format name.
2024-03-10 22:30:36 +01:00
6543 7fdc048153
Patch in exact search for meilisearch (#29671)
meilisearch does not have an search option to contorl fuzzynes per query
right now:
 - https://github.com/meilisearch/meilisearch/issues/1192
 - https://github.com/orgs/meilisearch/discussions/377
 - https://github.com/meilisearch/meilisearch/discussions/1096

so we have to create a workaround by post-filter the search result in
gitea until this is addressed.

For future works I added an option in backend only atm, to enable
fuzzynes for issue indexer too.
And also refactored the code so the fuzzy option is equal in logic to
code indexer


---
*Sponsored by Kithara Software GmbH*
2024-03-09 01:39:27 +00:00
yp05327 a2b0fb1a64
Fix wrong line number in code search result (#29260)
Fix #29136

Before: The result is a table and all line numbers are all in one row.

After: Use a separate table column for the line numbers.

---------

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
2024-03-06 07:24:43 +00:00
6543 a3f05d0d98
remove util.OptionalBool and related functions (#29513)
and migrate affected code

_last refactoring bits to replace **util.OptionalBool** with
**optional.Option[bool]**_
2024-03-02 16:42:31 +01:00
6543 f6656181e4
migrate some more "OptionalBool" to "Option[bool]" (#29479)
just some refactoring bits towards replacing **util.OptionalBool** with
**optional.Option[bool]**

---------

Co-authored-by: KN4CK3R <admin@oldschoolhack.me>
2024-02-29 18:52:49 +00:00
KN4CK3R ad0a34b492
Add `io.Closer` guidelines (#29387)
Co-authored-by: Yarden Shoham <git@yardenshoham.com>
2024-02-25 13:05:23 +00:00
Zettat123 c42083a339
Allow non-admin users to delete review requests (#29057)
Fix #14459

The following users can add/remove review requests of a PR
- the poster of the PR
- the owner or collaborators of the repository
- members with read permission on the pull requests unit
2024-02-24 12:38:43 +00:00
dark-angel 5c0fc90872
fix: Elasticsearch: Request Entity Too Large #28117 (#29062)
Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in #28117

This issue occured in large repositories while Gitea tries to 
index the code using ElasticSearch.

I've applied necessary changes that takes batch length from below config
(app.ini)
```
[queue.code_indexer]
BATCH_LENGTH=<length_int>
```
and batches all requests to Elasticsearch in chunks as configured in the
above config
2024-02-07 08:57:16 +00:00
Lunny Xiao 5f82ead13c
Simplify how git repositories are opened (#28937)
## Purpose
This is a refactor toward building an abstraction over managing git
repositories.
Afterwards, it does not matter anymore if they are stored on the local
disk or somewhere remote.

## What this PR changes
We used `git.OpenRepository` everywhere previously.
Now, we should split them into two distinct functions:

Firstly, there are temporary repositories which do not change:

```go
git.OpenRepository(ctx, diskPath)
```

Gitea managed repositories having a record in the database in the
`repository` table are moved into the new package `gitrepo`:

```go
gitrepo.OpenRepository(ctx, repo_model.Repo)
```

Why is `repo_model.Repository` the second parameter instead of file
path?
Because then we can easily adapt our repository storage strategy.
The repositories can be stored locally, however, they could just as well
be stored on a remote server.

## Further changes in other PRs
- A Git Command wrapper on package `gitrepo` could be created. i.e.
`NewCommand(ctx, repo_model.Repository, commands...)`. `git.RunOpts{Dir:
repo.RepoPath()}`, the directory should be empty before invoking this
method and it can be filled in the function only. #28940
- Remove the `RepoPath()`/`WikiPath()` functions to reduce the
possibility of mistakes.

---------

Co-authored-by: delvh <dev.lh@web.de>
2024-01-27 21:09:51 +01:00
silverwind 60e4a98ab0
Preserve BOM in web editor (#28935)
The `ToUTF8*` functions were stripping BOM, while BOM is actually valid
in UTF8, so the stripping must be optional depending on use case. This
does:

- Add a options struct to all `ToUTF8*` functions, that by default will
strip BOM to preserve existing behaviour
- Remove `ToUTF8` function, it was dead code
- Rename `ToUTF8WithErr` to `ToUTF8`
- Preserve BOM in Monaco Editor
- Remove a unnecessary newline in the textarea value. Browsers did
ignore it, it seems but it's better not to rely on this behaviour.

Fixes: https://github.com/go-gitea/gitea/issues/28743
Related: https://github.com/go-gitea/gitea/issues/6716 which seems to
have once introduced a mechanism that strips and re-adds the BOM, but
from what I can tell, this mechanism was removed at some point after
that PR.
2024-01-27 18:02:51 +00:00
Lunny Xiao c4cdebacfe
Fix sort bug on repository issues list (#28897)
Fix #28896
2024-01-23 09:17:42 +08:00
wxiaoguang 20929edc99
Add option to disable ambiguous unicode characters detection (#28454)
* Close #24483
* Close #28123
* Close #23682
* Close #23149

(maybe more)
2023-12-17 14:38:54 +00:00
Adam Majer cbf923e87b
Abstract hash function usage (#28138)
Refactor Hash interfaces and centralize hash function. This will allow
easier introduction of different hash function later on.

This forms the "no-op" part of the SHA256 enablement patch.
2023-12-13 21:02:00 +00:00
Jason Song beb71f5ef6
Include public repos in doer's dashboard for issue search (#28304)
It will fix #28268 .

<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">

<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">

## ⚠️ BREAKING ⚠️

But need to give up some features:

<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">

However, such abandonment may fix #28055 .

## Backgroud

When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.

<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">

It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?

Whatever, it has been broken by #26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes #28268.

## How to fix it

It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.

But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"

<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">

Before #26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of #26012).

And to keep this, #26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.


1bfcdeef4c/modules/indexer/issues/indexer.go (L318-L338)

## Give up unnecessary features

We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".

And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?

Let me show you my search dashboard on gitea.com.

<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">

I never think the long repo list helps anything.

And if we agree to abandon it, things will be much easier. That is this
PR.

## TODO

I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.

<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">

The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.

Or please block this PR and help to complete it.

Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
2023-12-07 13:26:18 +08:00
Brecht Van Lommel a7de14e493
Meilisearch: require all query terms to be matched (#28293)
Previously only the first term had to be matched. That default
Meilisearch behavior makes sense for e.g. some kind of autocomplete to
find and select a single result. But for filtering issues it means you
can't narrow down results by adding more terms.

This is also more consistent with other indexers and GitHub.

---

Reference:
https://www.meilisearch.com/docs/reference/api/search#matching-strategy
2023-11-29 23:00:59 +08:00
Nanguan Lin 1eae2aadae
Fix issue not showing on default board and add test (#27720)
See https://github.com/go-gitea/gitea/pull/27718#issuecomment-1773743014
. Add a test to ensure its behavior.
Why this test uses `ProjectBoardID=0`? Because in `SearchOptions`,
`ProjectBoardID=0` means what it is. But in `IssueOptions`,
`ProjectBoardID=0` means there is no condition, and
`ProjectBoardID=db.NoConditionID` means the board ID = 0.
It's really confusing. Probably it's better to separate the db search
engine and the other issue search code. It's really two different
systems. As far as I can see, `IssueOptions` is not necessary for most
of the code, which has very simple issue search conditions.
2023-10-25 11:51:49 +00:00
Nanguan Lin eb1478791f
Clean some functions about project issue (#27705)
1. remove unused function `MoveIssueAcrossProjectBoards`
2. extract the project board condition into a function
3. use db.NoCondition instead of -1. (BTW, the usage of db.NoCondition
is too confusing. Is there any way to avoid that?)
4. remove the unnecessary comment since the ctx refactor is completed.
5. Change `b.ID != 0` to `b.ID > 0`. It's more intuitive but I think
they're the same since board ID can't be negative.
2023-10-20 14:01:25 +02:00
Jason Song 1be49fdda6
Improve retrying index issues (#27554)
Fix #27540
2023-10-15 18:56:57 +00:00
Nanguan Lin dc04044716
Replace assert.Fail with assert.FailNow (#27578)
assert.Fail() will continue to execute the code while assert.FailNow()
not. I thought those uses of assert.Fail() should exit immediately.
PS: perhaps it's a good idea to use
[require](https://pkg.go.dev/github.com/stretchr/testify/require)
somewhere because the assert package's default behavior does not exit
when an error occurs, which makes it difficult to find the root error
reason.
2023-10-11 11:02:24 +00:00
JakobDev ebe803e514
Penultimate round of `db.DefaultContext` refactor (#27414)
Part of #27065

---------

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-10-11 04:24:07 +00:00
Lunny Xiao 673cf6af76
make writing main test easier (#27270)
This PR removed `unittest.MainTest` the second parameter
`TestOptions.GiteaRoot`. Now it detects the root directory by current
working directory.

---------

Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
2023-09-28 01:38:53 +00:00
Nanguan Lin 2f8e1604f8
Fix review request number and add more tests (#27104)
fix #27019 
## testfixture yml
1. add issue20(a pr issue) in repo 23, org 17
2. add user15 to team 9
3. add four reviews about issue20
## test case
add two tests that are described with code comments
the code before pr #26784 failed the first test
<img width="479" alt="image"
src="https://github.com/go-gitea/gitea/assets/70063547/1d9b5787-11b4-4c4d-931f-6a9869547f35">
current code failed the second test(as mentioned in #27019)
<img width="484" alt="image"
src="https://github.com/go-gitea/gitea/assets/70063547/05608055-7587-43d1-bae1-92c688270819">
Any advice is appreciated.

---------

Co-authored-by: CaiCandong <50507092+CaiCandong@users.noreply.github.com>
Co-authored-by: Giteabot <teabot@gitea.io>
2023-09-21 13:59:50 +02:00
JakobDev f91dbbba98
Next round of `db.DefaultContext` refactor (#27089)
Part of #27065
2023-09-16 14:39:12 +00:00
Nanguan Lin 7cdbe65a2c
Add tests for db indexer in indexer_test.go (#27087)
As described in the title.
Some points: 
1. Why need those tests?
Because `buildIssueOverview` is not well tested, there are several
continuous bugs in the issue overview webpage.
2. Why in indexer_test.go?
It's hard to put those tests in `./modules/indexer/issue/db/db_test.go`
because those tests need 'real' data in db mocked by fixtures instead of
random data in `./modules/indexer/issue/internal/tests`. When using
'real' data(`unittest.PrepareTestDatabase`), `InitIssueIndexer` and the
package `init()` function of `indexer` are required to init indexer.
3. Why only db?
The other three indexer engines are well tested by random data and it's
okay to also test them with 'real' data in db mocked by fixtures. Any
follow-up PR is welcome.
4. Those tests are really basic, any more complicated tests are welcome.
5. I think it's also necessary to add tests in `TestAPISearchIssues`
in`api_test_issue.go` and `TestIssues` in `home_test.go`
2023-09-16 11:15:21 +08:00
Nanguan Lin 0de09d3afc
Remove the useless function `GetUserIssueStats` and move relevant tests to `indexer_test.go` (#27067)
Since the issue indexer has been refactored, the issue overview webpage
is built by the `buildIssueOverview` function and underlying
`indexer.Search` function and `GetIssueStats` instead of
`GetUserIssueStats`. So the function is no longer used.
I moved the relevant tests to `indexer_test.go` and since the search
option changed from `IssueOptions` to `SearchOptions`, most of the tests
are useless now.
We need more tests about the db indexer because those tests are highly
connected with the issue overview webpage and now this page has several
bugs.
Any advice about those test cases is appreciated.

---------

Co-authored-by: CaiCandong <50507092+CaiCandong@users.noreply.github.com>
2023-09-14 12:35:53 -04:00
Nanguan Lin cda97a7253
Update status and code index after changing the default branch (#27018)
Fix #26723 
Add `ChangeDefaultBranch` to the `notifier` interface and implement it
in `indexerNotifier`. So when changing the default branch,
`indexerNotifier` sends a message to the `indexer queue` to update the
index.

---------

Co-authored-by: techknowlogick <matti@mdranta.net>
2023-09-13 04:43:31 +00:00