A heavy-duty web file browser for cRaZy files.

Go to file

Cyberes 4b9c1ba91a Merge dev to master		2023-12-08 22:25:59 -07:00
Doc	Merge dev to master	2023-12-08 22:25:59 -07:00
src	Merge dev to master	2023-12-08 22:25:59 -07:00
.gitignore	should be pretty good!	2023-07-17 23:20:21 -06:00
LICENSE	Initial commit	2023-07-13 13:54:42 -06:00
README.md	Merge dev to master	2023-12-08 22:25:59 -07:00
chatgpt suggestions.md	should be pretty good!	2023-07-17 23:20:21 -06:00
config.yml.sample	should be pretty good!	2023-07-17 23:20:21 -06:00
todo.txt	Merge dev to master	2023-12-08 22:25:59 -07:00

README.md

TODO: add a "last modified" to "sort" in https://chub-archive.evulid.cc/api/file/list?path=/chub.ai/characters&page=1&limit=50&sort=folders

TODO: add an admin endpoint to fetch the last n modified files. Maybe store files update time in elasticsearch?

TODO: fix the 3 loading placeholders

TODO: https://github.com/victorspringer/http-cache

TODO: fix encoding on https://chub-archive.evulid.cc/api/file/download?path=/other/takeout/part1.md

TODO: fix /api/file/download when an item is in the cache but does not exist on the disk

crazy-file-server

A heavy-duty web file browser for CRAZY files.

The whole schtick of this program is that it caches the directory and file structures so that the server doesn't have to re-read the disk on every request. By doing the processing upfront when the server starts along with some background scans to keep the cache fresh we can keep requests snappy and responsive.

I needed to serve a very large dataset full of small files publicly over the internet in an easy to browse website. The existing solutions were subpar and I found myself having to create confusing Openresty scripts and complex CDN caching to keep things responsive and server load low. I gave up and decided to create my own solution.

You will likely need to store your data on an SSD for this. With an SSD, my server was able to crawl over 6 million files stored in a very complicated directory tree in just 5 minutes.

Features

Automated cache management
- Optionally fill the cache on server start, or as requests come in.
- Watch for changes or scan interval.
File browsing API.
Download API.
Restrict certain files and directories from the download API to prevent users from downloading your entire 100GB+ dataset.
Frontend-agnostic design. You can have it serve a simple web interface or just act as a JSON API and serve files.
Simple resources. The resources for the frontend aren't compiled into the binary which allows you to modify or even replace it.
Basic searching.
Elasticsearch integration (to do).

Install

Install Go.
Download the binary or do cd src && go mod tidy && go build.

Use

Edit config.yml. It's well commented.
./crazyfs --config /path/to/config.yml. You can use -d for debug mode to see what it's doing.

By default, it looks for your config in the same directory as the executable: ./config.yml or ./config.yaml.

If you're using initial cache and have tons of files to scan you'll need at least 5GB of RAM and will have to wait 10 or so minutes for it to traverse the directory structure. CrazyFS is heavily threaded so you'll want at least an 8-core machine.

The search endpoint searches through the cached files. If they aren't cached, they won't be found. Enable pre-cache at startup to cache everything.