1.2 KiB
Elasticsearch Integration
A background thread syncs the cache with Elastic, rather than sync during the crawl. This is done so that the crawl is not slowed down and the webserver can start serving clients sooner. It may take hours to sync with Elastic, so it is better to run it as a background task.
There are two types of syncs: new and refresh. The "new" sync adds new files not already in Elastic and deletes files that are in Elastic but no longer in the cache. The "refresh" sync is a full sync and pushes every file to Elastic.
The intervals of these syncs are controlled by elasticsearch_sync_interval
and elasticsearch_full_sync_interval
.
By default, only one sync job can run at a time but setting elasticsearch_allow_concurrent_syncs
to true
allows both
to run at once.
On startup, a "new" sync is run. You can run a "refresh" sync by setting elasticsearch_full_sync_on_start
to true
.
Why we don't store the cache in Elasticsearch? Because Elastic is not as fast as fetching things from RAM.
Searching
We do an Elastic simple query string search.