1.2 KiB

Raw Blame History

Elasticsearch Integration

A background thread syncs the cache with Elastic, rather than sync during the crawl. This is done so that the crawl is not slowed down and the webserver can start serving clients sooner. It may take hours to sync with Elastic, so it is better to run it as a background task.

There are two types of syncs: new and refresh. The "new" sync adds new files not already in Elastic and deletes files that are in Elastic but no longer in the cache. The "refresh" sync is a full sync and pushes every file to Elastic.

The intervals of these syncs are controlled by elasticsearch_sync_interval and elasticsearch_full_sync_interval. By default, only one sync job can run at a time but setting elasticsearch_allow_concurrent_syncs to true allows both to run at once.

On startup, a "new" sync is run. You can run a "refresh" sync by setting elasticsearch_full_sync_on_start to true.

Why we don't store the cache in Elasticsearch? Because Elastic is not as fast as fetching things from RAM.

Searching

We do an Elastic simple query string search.

1.2 KiB Raw Blame History

Elasticsearch Integration

Searching

1.2 KiB

Raw Blame History