20 lines
1.2 KiB
Markdown
20 lines
1.2 KiB
Markdown
# Elasticsearch Integration
|
|
|
|
A background thread syncs the cache with Elastic, rather than sync during the crawl. This is done so that the crawl
|
|
is not slowed down and the webserver can start serving clients sooner. It may take hours to sync with Elastic, so it is
|
|
better to run it as a background task.
|
|
|
|
There are two types of syncs: new and refresh. The "new" sync adds new files not already in Elastic and deletes files
|
|
that are in Elastic but no longer in the cache. The "refresh" sync is a full sync and pushes every file to Elastic.
|
|
|
|
The intervals of these syncs are controlled by `elasticsearch_sync_interval` and `elasticsearch_full_sync_interval`.
|
|
By default, only one sync job can run at a time but setting `elasticsearch_allow_concurrent_syncs` to `true` allows both
|
|
to run at once.
|
|
|
|
On startup, a "new" sync is run. You can run a "refresh" sync by setting `elasticsearch_full_sync_on_start` to `true`.
|
|
|
|
Why we don't store the cache in Elasticsearch? Because Elastic is not as fast as fetching things from RAM.
|
|
|
|
### Searching
|
|
|
|
We do an Elastic [simple query string search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html). |