crazy-file-server/Doc/Elasticsearch.md

20 lines
1.2 KiB
Markdown

# Elasticsearch Integration
A background thread syncs the cache with Elastic, rather than sync during the crawl. This is done so that the crawl
is not slowed down and the webserver can start serving clients sooner. It may take hours to sync with Elastic, so it is
better to run it as a background task.
There are two types of syncs: new and refresh. The "new" sync adds new files not already in Elastic and deletes files
that are in Elastic but no longer in the cache. The "refresh" sync is a full sync and pushes every file to Elastic.
The intervals of these syncs are controlled by `elasticsearch_sync_interval` and `elasticsearch_full_sync_interval`.
By default, only one sync job can run at a time but setting `elasticsearch_allow_concurrent_syncs` to `true` allows both
to run at once.
On startup, a "new" sync is run. You can run a "refresh" sync by setting `elasticsearch_full_sync_on_start` to `true`.
Why we don't store the cache in Elasticsearch? Because Elastic is not as fast as fetching things from RAM.
### Searching
We do an Elastic [simple query string search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html).