Merge dev to master

This commit is contained in:
Cyberes 2023-12-08 22:25:59 -07:00
parent 627f4d2069
commit 4b9c1ba91a
56 changed files with 2716 additions and 1511 deletions

20
Doc/Elasticsearch.md Normal file
View File

@ -0,0 +1,20 @@
# Elasticsearch Integration
A background thread syncs the cache with Elastic, rather than sync during the crawl. This is done so that the crawl
is not slowed down and the webserver can start serving clients sooner. It may take hours to sync with Elastic, so it is
better to run it as a background task.
There are two types of syncs: new and refresh. The "new" sync adds new files not already in Elastic and deletes files
that are in Elastic but no longer in the cache. The "refresh" sync is a full sync and pushes every file to Elastic.
The intervals of these syncs are controlled by `elasticsearch_sync_interval` and `elasticsearch_full_sync_interval`.
By default, only one sync job can run at a time but setting `elasticsearch_allow_concurrent_syncs` to `true` allows both
to run at once.
On startup, a "new" sync is run. You can run a "refresh" sync by setting `elasticsearch_full_sync_on_start` to `true`.
Why we don't store the cache in Elasticsearch? Because Elastic is not as fast as fetching things from RAM.
### Searching
We do an Elastic [simple query string search](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html).

View File

@ -1,38 +1,51 @@
TODO: add a "last modified" to "sort"
in <https://chub-archive.evulid.cc/api/file/list?path=/chub.ai/characters&page=1&limit=50&sort=folders>
TODO: add an admin endpoint to fetch the last n modified files. Maybe store files update time in elasticsearch?
TODO: fix the 3 loading placeholders
TODO: <https://github.com/victorspringer/http-cache>
TODO: fix encoding on https://chub-archive.evulid.cc/api/file/download?path=/other/takeout/part1.md
TODO: fix /api/file/download when an item is in the cache but does not exist on the disk
# crazy-file-server # crazy-file-server
_A heavy-duty web file browser for CRAZY files._ *A heavy-duty web file browser for CRAZY files.*
The whole schtick of this program is that it caches the directory and file structures so that the server doesn't have to
re-read the disk on every request. By doing the processing upfront when the server starts along with some background
scans to keep the cache fresh we can keep requests snappy and responsive.
I needed to serve a very large dataset full of small files publicly over the internet in an easy to browse website. The
existing solutions were subpar and I found myself having to create confusing Openresty scripts and complex CDN caching
to keep things responsive and server load low. I gave up and decided to create my own solution.
The whole schtick of this program is that it caches the directory and file structures so that the server doesn't have to re-read the disk on every request. By doing the processing upfront when the server starts along with some background scans to keep the cache fresh we can keep requests snappy and responsive. You will likely need to store your data on an SSD for this. With an SSD, my server was able to crawl over 6 million
files stored in a very complicated directory tree in just 5 minutes.
## Features
I needed to serve a very large dataset full of small files publicly over the internet in an easy to browse website. My data was mounted over NFS so I had to take into account network delays. The existing solutions were subpar and I found myself having to create confusing Openresty scripts and complex CDN caching to keep things responsive and server load low. I gave up and decided to create my own solution.
**Features**
- Automated cache management - Automated cache management
- Optionally fill the cache on server start, or as requests come in. - Optionally fill the cache on server start, or as requests come in.
- Watch for changes or scan interval. - Watch for changes or scan interval.
- File browsing API. - File browsing API.
- Download API. - Download API.
- Restrict certain files and directories from the download API to prevent users from downloading your entire 100GB+ dataset. - Restrict certain files and directories from the download API to prevent users from downloading your entire 100GB+
dataset.
- Frontend-agnostic design. You can have it serve a simple web interface or just act as a JSON API and serve files. - Frontend-agnostic design. You can have it serve a simple web interface or just act as a JSON API and serve files.
- Simple resources. The resources for the frontend aren't compiled into the binary which allows you to modify or even replace it. - Simple resources. The resources for the frontend aren't compiled into the binary which allows you to modify or even
replace it.
- Basic searching. - Basic searching.
- Elasticsearch integration (to do). - Elasticsearch integration (to do).
## Install ## Install
1. Install Go. 1. Install Go.
2. Download the binary or do `cd src && go mod tidy && go build`. 2. Download the binary or do `cd src && go mod tidy && go build`.
## Use ## Use
1. Edit `config.yml`. It's well commented. 1. Edit `config.yml`. It's well commented.
@ -40,6 +53,9 @@ I needed to serve a very large dataset full of small files publicly over the int
By default, it looks for your config in the same directory as the executable: `./config.yml` or `./config.yaml`. By default, it looks for your config in the same directory as the executable: `./config.yml` or `./config.yaml`.
If you're using initial cache and have tons of files to scan you'll need at least 5GB of RAM and will have to wait 10 or so minutes for it to traverse the directory structure. CrazyFS is heavily threaded so you'll want at least an 8-core machine. If you're using initial cache and have tons of files to scan you'll need at least 5GB of RAM and will have to wait 10 or
so minutes for it to traverse the directory structure. CrazyFS is heavily threaded so you'll want at least an 8-core
machine.
The search endpoint searches through the cached files. If they aren't cached, they won't be found. Enable pre-cache at startup to cache everything. The search endpoint searches through the cached files. If they aren't cached, they won't be found. Enable pre-cache at
startup to cache everything.

94
src/CacheItem/Item.go Normal file
View File

@ -0,0 +1,94 @@
package CacheItem
import (
"crazyfs/config"
"crazyfs/file"
"os"
"path/filepath"
"strings"
"time"
)
func NewItem(fullPath string, info os.FileInfo) *Item {
if !strings.HasPrefix(fullPath, config.RootDir) {
// Retard check
log.Fatalf("NewItem was not passed an absolute path. The path must start with the RootDir: %s", fullPath)
}
if config.CachePrintNew {
log.Debugf("CACHE - new: %s", fullPath)
}
pathExists, _ := file.PathExists(fullPath)
if !pathExists {
if info.Mode()&os.ModeSymlink > 0 {
// Ignore symlinks
return nil
} else {
log.Warnf("NewItem - Path does not exist: %s", fullPath)
return nil
}
}
var mimeType string
var ext string
var err error
if !info.IsDir() {
var mimePath string
if config.FollowSymlinks && info.Mode()&os.ModeSymlink > 0 {
mimePath, _ = filepath.EvalSymlinks(fullPath)
} else {
mimePath = fullPath
}
if config.CrawlerParseMIME {
_, mimeType, ext, err = file.GetMimeType(mimePath, true, &info)
} else {
_, mimeType, ext, err = file.GetMimeType(mimePath, false, &info)
}
if os.IsNotExist(err) {
log.Warnf("Path does not exist: %s", fullPath)
return nil
} else if err != nil {
log.Warnf("Error detecting MIME type: %v", err)
}
}
// Create pointers for mimeType and ext
var mimeTypePtr, extPtr *string
if mimeType != "" {
mimeTypePtr = &mimeType
}
if ext != "" {
extPtr = &ext
}
return &Item{
Path: file.StripRootDir(fullPath),
Name: info.Name(),
Size: info.Size(),
Extension: extPtr,
Modified: info.ModTime().UTC().Format(time.RFC3339Nano),
Mode: uint32(info.Mode().Perm()),
IsDir: info.IsDir(),
IsSymlink: info.Mode()&os.ModeSymlink != 0,
Cached: time.Now().UnixNano() / int64(time.Millisecond), // Set the created time to now in milliseconds
Children: make([]string, 0),
Type: mimeTypePtr,
}
}
type Item struct {
Path string `json:"path"`
Name string `json:"name"`
Size int64 `json:"size"`
Extension *string `json:"extension"`
Modified string `json:"modified"`
Mode uint32 `json:"mode"`
IsDir bool `json:"isDir"`
IsSymlink bool `json:"isSymlink"`
Type *string `json:"type"`
Children []string `json:"children"`
Content string `json:"content,omitempty"`
Cached int64 `json:"cached"`
}

12
src/CacheItem/init.go Normal file
View File

@ -0,0 +1,12 @@
package CacheItem
import (
"crazyfs/logging"
"github.com/sirupsen/logrus"
)
var log *logrus.Logger
func init() {
log = logging.GetLogger()
}

View File

@ -0,0 +1,93 @@
package ResponseItem
import (
"crazyfs/CacheItem"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config"
"crazyfs/logging"
lru "github.com/hashicorp/golang-lru/v2"
"github.com/sirupsen/logrus"
"path/filepath"
)
var log *logrus.Logger
func init() {
log = logging.GetLogger()
}
type ResponseItem struct {
Path string `json:"path"`
Name string `json:"name"`
Size int64 `json:"size"`
Extension *string `json:"extension"`
Modified string `json:"modified"`
Mode uint32 `json:"mode"`
IsDir bool `json:"isDir"`
IsSymlink bool `json:"isSymlink"`
Type *string `json:"type"`
Children []*CacheItem.Item `json:"children"`
Content string `json:"content,omitempty"`
Cached int64 `json:"cached"`
}
func NewResponseItem(cacheItem *CacheItem.Item, sharedCache *lru.Cache[string, *CacheItem.Item]) *ResponseItem {
item := &ResponseItem{
Path: cacheItem.Path,
Name: cacheItem.Name,
Size: cacheItem.Size,
Extension: cacheItem.Extension,
Modified: cacheItem.Modified,
Mode: cacheItem.Mode,
IsDir: cacheItem.IsDir,
IsSymlink: cacheItem.IsSymlink,
Cached: cacheItem.Cached,
Children: make([]*CacheItem.Item, len(cacheItem.Children)),
Type: cacheItem.Type,
}
// Grab the children from the cache and add them to this new item
if len(cacheItem.Children) > 0 { // avoid a null entry for the children key in the JSON
var children []*CacheItem.Item
for _, child := range cacheItem.Children {
childItem, found := sharedCache.Get(child)
// Do a quick crawl since the path could have been modfied since the last crawl.
// This also be triggered if we encounter a broken symlink. We don't check for broken symlinks when scanning
// because that would be an extra os.Lstat() call in processPath().
if !found {
log.Debugf("CRAWLER - %s not in cache, crawling", child)
dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
item, err := dc.CrawlNoRecursion(filepath.Join(config.RootDir, child))
if err != nil {
log.Errorf("NewResponseItem - CrawlNoRecursion - %s", err)
continue // skip this child
}
if item == nil {
log.Debugf("NewResponseItem - CrawlNoRecursion - not found %s - likely broken symlink", child)
continue
}
}
copiedChildItem := &CacheItem.Item{
Path: childItem.Path,
Name: childItem.Name,
Size: childItem.Size,
Extension: childItem.Extension,
Modified: childItem.Modified,
Mode: childItem.Mode,
IsDir: childItem.IsDir,
IsSymlink: childItem.IsSymlink,
Cached: childItem.Cached,
Children: nil,
Type: childItem.Type,
}
children = append(children, copiedChildItem)
}
item.Children = children
}
return item
}

View File

@ -1,40 +1,39 @@
package api package api
import ( import (
"crazyfs/CacheItem"
"crazyfs/api/helpers"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data" "crazyfs/elastic"
"encoding/json" "encoding/json"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"net/http" "net/http"
) )
func AdminCacheInfo(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { func AdminCacheInfo(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
auth := r.URL.Query().Get("auth") auth := r.URL.Query().Get("auth")
if auth == "" || auth != cfg.HttpAdminKey { if auth == "" || auth != cfg.HttpAdminKey {
w.Header().Set("Content-Type", "application/json") helpers.Return403Msg("access denied", w)
w.WriteHeader(http.StatusForbidden)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 403,
"error": "access denied",
})
return return
} }
cacheLen := sharedCache.Len() cacheLen := sharedCache.Len()
keys := r.URL.Query().Get("keys")
var cacheKeys []string
if keys != "" {
cacheKeys = sharedCache.Keys()
} else {
cacheKeys = []string{}
}
response := map[string]interface{}{ response := map[string]interface{}{
"cache_size": cacheLen, "cache_size": cacheLen,
"cache_keys": cacheKeys, "cache_max": cfg.CacheSize,
"cache_max": cfg.CacheSize, "crawls_running": DirectoryCrawler.GetGlobalActiveCrawls(),
"active_workers": DirectoryCrawler.ActiveWorkers,
"busy_workers": DirectoryCrawler.ActiveWalks,
"new_sync_running": elastic.ElasticRefreshSyncRunning,
"refresh_sync_running": elastic.ElasticRefreshSyncRunning,
} }
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response) err := json.NewEncoder(w).Encode(response)
if err != nil {
log.Errorf("AdminCacheInfo - Failed to serialize JSON: %s", err)
return
}
} }

View File

@ -1,21 +1,17 @@
package api package api
import ( import (
"crazyfs/CacheItem"
"crazyfs/api/helpers" "crazyfs/api/helpers"
"crazyfs/cache" "crazyfs/cache"
"crazyfs/config" "crazyfs/config"
"crazyfs/data" "crazyfs/file"
"crazyfs/logging"
"encoding/json" "encoding/json"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"net/http" "net/http"
"path/filepath"
"strings"
) )
func AdminReCache(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { func AdminReCache(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
log := logging.GetLogger()
if r.Method != http.MethodPost { if r.Method != http.MethodPost {
helpers.Return400Msg("this is a POST endpoint", w) helpers.Return400Msg("this is a POST endpoint", w)
return return
@ -31,33 +27,25 @@ func AdminReCache(w http.ResponseWriter, r *http.Request, cfg *config.Config, sh
auth := requestBody["auth"] auth := requestBody["auth"]
if auth == "" || auth != cfg.HttpAdminKey { if auth == "" || auth != cfg.HttpAdminKey {
w.Header().Set("Content-Type", "application/json") helpers.Return403Msg("access denied", w)
w.WriteHeader(http.StatusForbidden)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 403,
"error": "access denied",
})
return return
} }
pathArg := requestBody["path"] pathArg := requestBody["path"]
// Clean the path to prevent directory traversal // Clean the path to prevent directory traversal
if strings.Contains(pathArg, "/../") || strings.HasPrefix(pathArg, "../") || strings.HasSuffix(pathArg, "/..") { fullPath, errJoin := file.SafeJoin(pathArg)
w.Header().Set("Content-Type", "application/json") traversalAttack, errTraverse := file.DetectTraversal(pathArg)
w.WriteHeader(http.StatusBadRequest) if traversalAttack || errJoin != nil {
json.NewEncoder(w).Encode(map[string]interface{}{ log.Errorf("LIST - failed to clean path: %s - error: %s - traversal attack detected: %t - traversal attack detection: %s", pathArg, errJoin, traversalAttack, errTraverse)
"code": http.StatusBadRequest, helpers.Return400Msg("invalid file path", w)
"error": "invalid file path",
})
return return
} }
fullPath := filepath.Join(cfg.RootDir, filepath.Clean("/"+pathArg))
//relPath := cache.StripRootDir(fullPath, cfg.RootDir) //relPath := cache.StripRootDir(fullPath, cfg.RootDir)
// Check and re-cache the directory // Check and re-cache the directory
cache.Recache(fullPath, cfg, sharedCache) cache.Recache(fullPath, sharedCache)
response := map[string]interface{}{ response := map[string]interface{}{
"message": "Re-cache triggered for directory: " + fullPath, "message": "Re-cache triggered for directory: " + fullPath,
@ -65,5 +53,9 @@ func AdminReCache(w http.ResponseWriter, r *http.Request, cfg *config.Config, sh
log.Infof("Admin triggered recache for %s", fullPath) log.Infof("Admin triggered recache for %s", fullPath)
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response) err = json.NewEncoder(w).Encode(response)
if err != nil {
log.Errorf("AdminRecache - Failed to serialize JSON: %s", err)
return
}
} }

View File

@ -1,57 +1,71 @@
package api package api
import ( import (
"crazyfs/CacheItem"
"crazyfs/api/helpers" "crazyfs/api/helpers"
"crazyfs/cache"
"crazyfs/config" "crazyfs/config"
"crazyfs/data"
"crazyfs/file" "crazyfs/file"
"crazyfs/logging" "crazyfs/logging"
"encoding/json"
"fmt" "fmt"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"net/http" "net/http"
"path/filepath"
"strings" "strings"
) )
func Download(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { func Download(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
if cache.InitialCrawlInProgress && !cfg.HttpAllowDuringInitialCrawl { if helpers.CheckInitialCrawl() {
helpers.HandleRejectDuringInitialCrawl(w) helpers.HandleRejectDuringInitialCrawl(w)
return return
} }
log := logging.GetLogger() log := logging.GetLogger()
queryPath := r.URL.Query().Get("path") pathArg := r.URL.Query().Get("path")
if queryPath == "" { if pathArg == "" {
helpers.Return400Msg("missing path", w) helpers.Return400Msg("missing path", w)
return return
} }
paths := strings.Split(queryPath, ",") paths := strings.Split(pathArg, ",")
var cleanPaths []string
if len(paths) > 1 { if len(paths) > 1 {
for _, path := range paths {
cleanPath, errJoin := file.SafeJoin(path)
traversalAttack, errTraverse := file.DetectTraversal(path)
if traversalAttack || errJoin != nil {
log.Errorf("DOWNLOAD - failed to clean path: %s - error: %s - traversal attack detected: %t - traversal attack detection: %s", path, errJoin, traversalAttack, errTraverse)
helpers.Return400Msg("invalid file path", w)
return
}
relPath := file.StripRootDir(cleanPath)
if helpers.CheckPathRestricted(relPath) {
helpers.Return403Msg("not allowed to download this path", w)
return
}
cleanPaths = append(cleanPaths, cleanPath)
}
// Multiple files, zip them // Multiple files, zip them
file.ZipHandlerCompressMultiple(paths, w, r, cfg, sharedCache) helpers.ZipHandlerCompressMultiple(cleanPaths, w, r, cfg, sharedCache)
return return
} }
// Single file or directory // Single file or directory
relPath := cache.StripRootDir(filepath.Join(cfg.RootDir, paths[0]), cfg.RootDir) fullPath, errJoin := file.SafeJoin(pathArg)
relPath = strings.TrimSuffix(relPath, "/") traversalAttack, errTraverse := file.DetectTraversal(pathArg)
fullPath := filepath.Join(cfg.RootDir, relPath) if traversalAttack || errJoin != nil {
log.Errorf("DOWNLOAD - failed to clean path: %s - error: %s - traversal attack detected: %t - traversal attack detection: %s", pathArg, errJoin, traversalAttack, errTraverse)
helpers.Return400Msg("invalid file path", w)
return
}
relPath := file.StripRootDir(fullPath)
// Check if the path is in the restricted download paths // Check if the path is in the restricted download paths
for _, restrictedPath := range cfg.RestrictedDownloadPaths { if helpers.CheckPathRestricted(relPath) {
if relPath == restrictedPath { helpers.Return403Msg("not allowed to download this path", w)
w.Header().Set("Content-Type", "application/json") return
w.WriteHeader(http.StatusForbidden)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": http.StatusForbidden,
"error": "not allowed to download this path",
})
return
}
} }
// Try to get the data from the cache // Try to get the data from the cache
@ -76,28 +90,25 @@ func Download(w http.ResponseWriter, r *http.Request, cfg *config.Config, shared
var mimeType string var mimeType string
var err error var err error
if item.Type == nil { if item.Type == nil {
fileExists, mimeType, _, err = cache.GetFileMime(fullPath, true) fileExists, mimeType, _, err = file.GetMimeType(fullPath, true, nil)
if !fileExists { if !fileExists {
helpers.Return400Msg("file not found", w) helpers.Return400Msg("file not found", w)
} }
if err != nil { if err != nil {
log.Warnf("Error detecting MIME type: %v", err) log.Warnf("Error detecting MIME type: %v", err)
w.Header().Set("Content-Type", "application/json") helpers.Return500Msg(w)
w.WriteHeader(http.StatusInternalServerError)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 500,
"error": "internal server error",
})
return return
} }
// GetFileMime() returns an empty string if it was a directory // GetMimeType() returns an empty string if it was a directory
if mimeType != "" { if mimeType != "" {
// Update the item's MIME in the sharedCache // Update the CacheItem's MIME in the sharedCache
item.Type = &mimeType item.Type = &mimeType
sharedCache.Add(relPath, item) sharedCache.Add(relPath, item)
} }
} }
// https://stackoverflow.com/a/57994289
// Only files can have inline disposition, zip archives cannot // Only files can have inline disposition, zip archives cannot
contentDownload := r.URL.Query().Get("download") contentDownload := r.URL.Query().Get("download")
var disposition string var disposition string
@ -113,6 +124,6 @@ func Download(w http.ResponseWriter, r *http.Request, cfg *config.Config, shared
} else { } else {
// Stream archive of the directory here // Stream archive of the directory here
w.Header().Set("Content-Disposition", fmt.Sprintf(`attachment; filename="%s.zip"`, item.Name)) w.Header().Set("Content-Disposition", fmt.Sprintf(`attachment; filename="%s.zip"`, item.Name))
file.ZipHandlerCompress(fullPath, w, r) helpers.ZipHandlerCompress(fullPath, w, r)
} }
} }

View File

@ -1,9 +1,10 @@
package api package api
import ( import (
"crazyfs/CacheItem"
"crazyfs/cache" "crazyfs/cache"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data"
"encoding/json" "encoding/json"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"net/http" "net/http"
@ -11,14 +12,18 @@ import (
// TODO: show the time the initial crawl started // TODO: show the time the initial crawl started
func HealthCheck(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { func HealthCheck(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
//log := logging.GetLogger() //log := logging.GetLogger()
response := map[string]interface{}{} response := map[string]interface{}{}
//response["scan_running"] = cache.GetRunningScans() > 0 response["scan_running"] = DirectoryCrawler.GetGlobalActiveCrawls() > 0
response["initial_scan_running"] = cache.InitialCrawlInProgress response["initial_scan_running"] = cache.InitialCrawlInProgress
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response) err := json.NewEncoder(w).Encode(response)
if err != nil {
log.Errorf("HEALTH - Failed to serialize JSON: %s", err)
return
}
} }

185
src/api/List.go Normal file
View File

@ -0,0 +1,185 @@
package api
import (
"crazyfs/CacheItem"
"crazyfs/ResponseItem"
"crazyfs/api/helpers"
"crazyfs/config"
"crazyfs/file"
"encoding/json"
lru "github.com/hashicorp/golang-lru/v2"
"net/http"
"strconv"
)
func ListDir(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
if helpers.CheckInitialCrawl() {
helpers.HandleRejectDuringInitialCrawl(w)
return
}
pathArg := r.URL.Query().Get("path")
if pathArg == "" {
helpers.Return400Msg("path parameter is required", w)
return
}
var err error
sortArg := r.URL.Query().Get("sort")
var folderSorting string
if sortArg == "default" || sortArg == "" {
folderSorting = "default"
} else if sortArg == "folders" {
folderSorting = "folders"
} else {
helpers.Return400Msg("folders arg must be 'default' (to not do any sorting) or 'first' (to sort the folders to the front of the list)", w)
return
}
fullPath, errJoin := file.SafeJoin(pathArg)
traversalAttack, errTraverse := file.DetectTraversal(pathArg)
if traversalAttack || errJoin != nil {
log.Errorf("LIST - failed to clean path: %s - error: %s - traversal attack detected: %t - traversal attack detection: %s", pathArg, errJoin, traversalAttack, errTraverse)
helpers.Return400Msg("invalid file path", w)
return
}
relPath := file.StripRootDir(fullPath)
// Try to get the data from the cache
cacheItem, found := sharedCache.Get(relPath)
if !found {
cacheItem = helpers.HandleFileNotFound(relPath, fullPath, sharedCache, cfg, w)
}
if cacheItem == nil {
return // The errors have already been handled in handleFileNotFound() so we're good to just exit
}
// Create a copy of the cached Item so we don't modify the Item in the cache
item := ResponseItem.NewResponseItem(cacheItem, sharedCache)
// Get the MIME type of the file if the 'mime' argument is present
mime := r.URL.Query().Get("mime")
if mime != "" {
if item.IsDir && !cfg.HttpAllowDirMimeParse {
helpers.Return403Msg("not allowed to analyze the mime of directories", w)
return
} else {
// Only update the mime in the cache if it hasn't been set already.
// TODO: need to make sure that when a re-crawl is triggered, the Type is set back to nil
if item.Type == nil {
fileExists, mimeType, ext, err := file.GetMimeType(fullPath, true, nil)
if !fileExists {
helpers.ReturnFake404Msg("file not found", w)
}
if err != nil {
log.Warnf("Error detecting MIME type: %v", err)
helpers.Return500Msg(w)
return
}
// Update the original cached CacheItem's MIME in the sharedCache
cacheItem.Type = &mimeType
cacheItem.Extension = &ext
sharedCache.Add(relPath, cacheItem) // take the address of CacheItem
}
}
}
response := map[string]interface{}{}
// Pagination
var paginationLimit int
if r.URL.Query().Get("limit") != "" {
if !helpers.IsNonNegativeInt(r.URL.Query().Get("limit")) {
helpers.Return400Msg("limit must be a positive number", w)
return
}
paginationLimit, err = strconv.Atoi(r.URL.Query().Get("limit"))
if err != nil {
log.Errorf("Error parsing limit: %v", err)
helpers.Return400Msg("limit must be a valid integer", w)
return
}
} else {
paginationLimit = 100
}
totalItems := len(item.Children)
totalPages := totalItems / paginationLimit
if totalItems%paginationLimit != 0 {
totalPages++
}
if r.URL.Query().Get("page") != "" {
response["total_pages"] = totalPages
}
if folderSorting == "folders" {
var dirs, files []*CacheItem.Item
for _, child := range item.Children {
if child.IsDir {
dirs = append(dirs, child)
} else {
files = append(files, child)
}
}
item.Children = append(dirs, files...)
}
//Set the children to an empty array so that the JSON encoder doesn't return it as nil
var paginatedChildren []*CacheItem.Item // this var is either the full CacheItem list or a paginated list depending on the query args
if item.Children != nil {
paginatedChildren = item.Children
} else {
paginatedChildren = make([]*CacheItem.Item, 0)
}
pageParam := r.URL.Query().Get("page")
if pageParam != "" {
page, err := strconv.Atoi(pageParam)
if err != nil || page < 1 || page > totalPages {
// Don't return an error, just trunucate things
page = totalPages
}
start := (page - 1) * paginationLimit
end := start + paginationLimit
if start >= 0 { // avoid segfaults
if start > len(item.Children) {
start = len(item.Children)
}
if end > len(item.Children) {
end = len(item.Children)
}
paginatedChildren = paginatedChildren[start:end]
}
}
// Erase the children of the children so we aren't displaying things recursively
for i := range paginatedChildren {
paginatedChildren[i].Children = nil
}
response["item"] = map[string]interface{}{
"path": item.Path,
"name": item.Name,
"size": item.Size,
"extension": item.Extension,
"modified": item.Modified,
"mode": item.Mode,
"isDir": item.IsDir,
"isSymlink": item.IsSymlink,
"cached": item.Cached,
"children": paginatedChildren,
"type": item.Type,
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json")
err = json.NewEncoder(w).Encode(response)
if err != nil {
log.Errorf("LIST - Failed to serialize JSON: %s", err)
return
}
}

View File

@ -1,23 +1,22 @@
package api package api
import ( import (
"bytes" "crazyfs/CacheItem"
"crazyfs/api/helpers" "crazyfs/api/helpers"
"crazyfs/cache" "crazyfs/cache"
"crazyfs/config" "crazyfs/config"
"crazyfs/data" "crazyfs/elastic"
"encoding/gob"
"encoding/json" "encoding/json"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"log"
"net/http" "net/http"
"sort" "sort"
"strconv" "strconv"
"strings" "strings"
"time"
) )
func SearchFile(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { func SearchFile(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
if cache.InitialCrawlInProgress && !cfg.HttpAllowDuringInitialCrawl { if helpers.CheckInitialCrawl() {
helpers.HandleRejectDuringInitialCrawl(w) helpers.HandleRejectDuringInitialCrawl(w)
return return
} }
@ -28,8 +27,10 @@ func SearchFile(w http.ResponseWriter, r *http.Request, cfg *config.Config, shar
return return
} }
queryString = strings.ToLower(queryString) // convert to lowercase if !cfg.ElasticsearchEnable {
//queryElements := strings.Split(queryString, " ") // split by spaces // If we aren't using Elastic, convert the query to lowercase to reduce the complication.
queryString = strings.ToLower(queryString)
}
excludeString := r.URL.Query().Get("exclude") // get exclude parameter excludeString := r.URL.Query().Get("exclude") // get exclude parameter
var excludeElements []string var excludeElements []string
@ -40,7 +41,7 @@ func SearchFile(w http.ResponseWriter, r *http.Request, cfg *config.Config, shar
limitResultsStr := r.URL.Query().Get("limit") limitResultsStr := r.URL.Query().Get("limit")
var limitResults int var limitResults int
if limitResultsStr != "" { if limitResultsStr != "" {
if !helpers.IsPositiveInt(limitResultsStr) { if !helpers.IsNonNegativeInt(limitResultsStr) {
helpers.Return400Msg("limit must be positive number", w) helpers.Return400Msg("limit must be positive number", w)
return return
} }
@ -51,60 +52,97 @@ func SearchFile(w http.ResponseWriter, r *http.Request, cfg *config.Config, shar
sortArg := r.URL.Query().Get("sort") sortArg := r.URL.Query().Get("sort")
var folderSorting string var folderSorting string
if sortArg == "default" || sortArg == "" {
switch sortArg {
case "default", "":
folderSorting = "default" folderSorting = "default"
} else if sortArg == "folders" { case "folders":
folderSorting = "folders" folderSorting = "folders"
} else { default:
helpers.Return400Msg("folders arg must be 'default' (to not do any sorting) or 'first' (to sort the folders to the front of the list)", w) helpers.Return400Msg("folders arg must be 'default' (to not do any sorting) or 'first' (to sort the folders to the front of the list)", w)
return return
} }
results := make([]*data.Item, 0) searchStart := time.Now()
outer:
for _, key := range sharedCache.Keys() {
cacheItem, found := sharedCache.Get(key)
if found {
//for _, query := range queryElements {
if strings.Contains(strings.ToLower(key), queryString) { // query) { // convert key to lowercase
// check if key contains any of the exclude elements
shouldExclude := false
for _, exclude := range excludeElements {
if strings.Contains(strings.ToLower(key), exclude) {
shouldExclude = true
break
}
}
if shouldExclude {
continue
}
// Create a deep copy of the item var results []*CacheItem.Item
var buf bytes.Buffer results = make([]*CacheItem.Item, 0)
enc := gob.NewEncoder(&buf)
dec := gob.NewDecoder(&buf) if cfg.ElasticsearchEnable {
err := enc.Encode(cacheItem) // Perform the Elasticsearch query
if err != nil { resp, err := elastic.Search(queryString, excludeElements, cfg)
log.Printf("Error encoding item: %v", err) if err != nil {
return log.Errorf("SEARCH - Failed to perform Elasticsearch query: %s", err)
} helpers.Return500Msg(w)
var item data.Item return
err = dec.Decode(&item)
if err != nil {
log.Printf("Error decoding item: %v", err)
return
}
if !cfg.ApiSearchShowChildren {
item.Children = make([]*data.Item, 0) // erase the children dict
}
results = append(results, &item)
if (limitResults > 0 && len(results) == limitResults) || len(results) >= cfg.ApiSearchMaxResults {
break outer
}
}
//}
} }
// Parse the Elasticsearch response
var respData map[string]interface{}
err = json.NewDecoder(resp.Body).Decode(&respData)
if err != nil {
log.Errorf("SEARCH - Failed to parse Elasticsearch response: %s", err)
helpers.Return500Msg(w)
return
}
if resp.IsError() || resp.StatusCode != 200 {
// Elastic reported an error with the query.
var errorMsg, clientResp string
errorMsg, err = elastic.GetSearchFailureReason(respData)
if err == nil {
clientResp = errorMsg
} else {
clientResp = "Query failed"
}
helpers.Return400Msg(clientResp, w)
return
}
if respData["hits"] != nil {
// Extract the results from the Elasticsearch response
hits := respData["hits"].(map[string]interface{})["hits"].([]interface{})
items := make([]*CacheItem.Item, len(hits))
for i, hit := range hits {
itemSource := hit.(map[string]interface{})["_source"].(map[string]interface{})
// Elastic does some things differently than us.
var itemExtension *string
if itemSource["extension"] != nil {
extensionStr := itemSource["extension"].(string)
itemExtension = &extensionStr
}
var itemType *string
if itemSource["type"] != nil {
typeStr := itemSource["extension"].(string)
itemType = &typeStr
}
//score := hit.(map[string]interface{})["_score"].(float64)
item := &CacheItem.Item{
Path: itemSource["path"].(string),
Name: itemSource["name"].(string),
Size: int64(itemSource["size"].(float64)),
Extension: itemExtension,
Modified: itemSource["modified"].(string),
Mode: uint32(itemSource["mode"].(float64)),
IsDir: itemSource["isDir"].(bool),
IsSymlink: itemSource["isSymlink"].(bool),
Type: itemType,
Cached: int64(itemSource["cached"].(float64)),
}
items[i] = item
}
// Sort the items by their Elasticsearch _score
sort.Slice(items, func(i, j int) bool {
return hits[i].(map[string]interface{})["_score"].(float64) > hits[j].(map[string]interface{})["_score"].(float64)
})
results = append(results, items...)
}
} else {
results = cache.SearchLRU(queryString, excludeElements, limitResults, sharedCache, cfg)
} }
if folderSorting == "folders" { if folderSorting == "folders" {
@ -113,9 +151,17 @@ outer:
}) })
} }
searchDuration := time.Since(searchStart).Round(time.Second)
log.Infof("SEARCH - completed in %s and returned %d items", searchDuration, len(results))
w.Header().Set("Cache-Control", "no-store") w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]interface{}{ err := json.NewEncoder(w).Encode(map[string]interface{}{
"results": results, "results": results,
}) })
if err != nil {
log.Errorf("SEARCH - Failed to serialize JSON: %s", err)
helpers.Return500Msg(w)
return
}
} }

229
src/api/Thumbnail.go Normal file
View File

@ -0,0 +1,229 @@
package api
import (
"bytes"
"crazyfs/CacheItem"
"crazyfs/api/helpers"
"crazyfs/cache"
"crazyfs/config"
"crazyfs/file"
"crazyfs/logging"
"fmt"
"github.com/disintegration/imaging"
lru "github.com/hashicorp/golang-lru/v2"
"github.com/nfnt/resize"
"strconv"
"image"
"image/color"
"image/png"
"net/http"
"path/filepath"
"strings"
)
func Thumbnail(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
if cache.InitialCrawlInProgress && !cfg.HttpAllowDuringInitialCrawl {
helpers.HandleRejectDuringInitialCrawl(w)
returnDummyPNG(w)
return
}
log := logging.GetLogger()
relPath := file.StripRootDir(filepath.Join(cfg.RootDir, r.URL.Query().Get("path")))
relPath = strings.TrimSuffix(relPath, "/")
fullPath := filepath.Join(cfg.RootDir, relPath)
// Validate args before doing any operations
width, err := getPositiveIntFromQuery(r, "width")
if err != nil {
helpers.Return400Msg("height and width must both be positive numbers", w)
return
}
height, err := getPositiveIntFromQuery(r, "height")
if err != nil {
helpers.Return400Msg("height and width must both be positive numbers", w)
return
}
pngQuality, err := getPositiveIntFromQuery(r, "quality")
if err != nil {
helpers.Return400Msg("quality must be a positive number", w)
return
}
if pngQuality == 0 {
pngQuality = 50
}
autoScale := r.URL.Query().Get("auto") != ""
square := r.URL.Query().Get("square") != ""
if (width != 0 && height != 0) && (width != height) {
helpers.Return400Msg("width and height must be equal in square mode, or only one provided", w)
return
}
// Try to get the data from the cache
item, found := sharedCache.Get(relPath)
if !found {
item = helpers.HandleFileNotFound(relPath, fullPath, sharedCache, cfg, w)
}
if item == nil {
returnDummyPNG(w)
return
}
if item.IsDir {
helpers.Return400Msg("that's a directory", w)
return
}
// Get the MIME type of the file
fileExists, mimeType, ext, err := file.GetMimeType(fullPath, true, nil)
if !fileExists {
helpers.Return400Msg("file not found", w)
return
}
if err != nil {
log.Errorf("THUMB - error detecting MIME type: %v", err)
returnDummyPNG(w)
return
}
// Update the CacheItem's MIME in the sharedCache
item.Type = &mimeType
item.Extension = &ext
sharedCache.Add(relPath, item)
// Check if the file is an image
if !strings.HasPrefix(mimeType, "image/") {
helpers.Return400Msg("file is not an image", w)
return
}
// Convert the image to a PNG
imageBytes, err := file.ConvertToPNG(fullPath, mimeType)
if err != nil {
log.Warnf("Error converting %s to PNG: %v", fullPath, err)
returnDummyPNG(w)
return
}
// Decode the image
var img image.Image
img, err = png.Decode(bytes.NewReader(imageBytes))
if err != nil {
log.Warnf("Error decoding %s image data: %v", fullPath, err)
returnDummyPNG(w)
return
}
// Resize the image
img, err = resizeImage(img, width, height, square, autoScale)
if err != nil {
helpers.Return400Msg(err.Error(), w)
return
}
buf, err := file.CompressPNGFile(img, pngQuality)
if err != nil {
log.Warnf("Error compressing %s to PNG: %v", fullPath, err)
returnDummyPNG(w)
return
}
// Return the image
w.Header().Set("Content-Type", "image/png")
w.Write(buf.Bytes())
}
func getPositiveIntFromQuery(r *http.Request, key string) (int, error) {
str := r.URL.Query().Get(key)
if str == "" {
return 0, nil
}
if !helpers.IsNonNegativeInt(str) {
return 0, fmt.Errorf("invalid value for %s", key)
}
value, _ := strconv.ParseInt(str, 10, 32)
return int(value), nil
}
func returnDummyPNG(w http.ResponseWriter) {
img := image.NewRGBA(image.Rect(0, 0, 300, 300))
blue := color.RGBA{R: 255, G: 255, B: 255, A: 255}
for y := 0; y < img.Bounds().Dy(); y++ {
for x := 0; x < img.Bounds().Dx(); x++ {
img.Set(x, y, blue)
}
}
buffer := new(bytes.Buffer)
if err := png.Encode(buffer, img); err != nil {
http.Error(w, "encode failed", http.StatusInternalServerError)
return
}
// TODO: set cache-control based on config?
w.Header().Set("Content-Type", "image/png")
_, err := w.Write(buffer.Bytes())
if err != nil {
log.Errorf("THUMBNAIL - Failed to write buffer: %s", err)
return
}
}
func resizeImage(img image.Image, width, height int, square, autoScale bool) (image.Image, error) {
if square {
var size int
if width == 0 && height == 0 {
size = 300
} else if width != 0 {
size = width
} else {
size = height
}
if size > img.Bounds().Dx() || size > img.Bounds().Dy() {
size = helpers.Max(img.Bounds().Dx(), img.Bounds().Dy())
}
// First, make the image square by scaling the smallest dimension to the larget size
if img.Bounds().Dx() > img.Bounds().Dy() {
width = 0
height = size
} else {
width = size
height = 0
}
resized := resize.Resize(uint(width), uint(height), img, resize.Lanczos3)
// Then crop the image to the target size
img = imaging.CropCenter(resized, size, size)
} else {
if width == 0 && height == 0 {
if autoScale {
// If both width and height parameters are not provided, set
// the largest dimension to 300 and scale the other.
if img.Bounds().Dx() > img.Bounds().Dy() {
width = 300
height = 0
} else {
width = 0
height = 300
}
} else {
// Don't auto-resize because this endpoint can also be used for simply reducing the quality of an image
width = img.Bounds().Dx()
height = img.Bounds().Dy()
}
} else if width == 0 {
// If only width is provided, calculate the height based on the image's aspect ratio
width = img.Bounds().Dx() * height / img.Bounds().Dy()
} else if height == 0 {
height = img.Bounds().Dy() * width / img.Bounds().Dx()
}
// Scale the image. If the image is smaller than the provided height or width, it won't be resized.
img = resize.Resize(uint(width), uint(height), img, resize.Lanczos3)
}
return img, nil
}

27
src/api/client/Health.go Normal file
View File

@ -0,0 +1,27 @@
package client
import (
"crazyfs/CacheItem"
"crazyfs/cache"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config"
"encoding/json"
lru "github.com/hashicorp/golang-lru/v2"
"net/http"
)
// TODO: show the time the initial crawl started
func ClientHealthCheck(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
response := map[string]interface{}{}
response["scan_running"] = DirectoryCrawler.GetGlobalActiveCrawls() > 0
response["initial_scan_running"] = cache.InitialCrawlInProgress
w.Header().Set("Content-Type", "application/json")
err := json.NewEncoder(w).Encode(response)
if err != nil {
log.Errorf("HEALTH - Failed to serialize JSON: %s", err)
return
}
}

View File

@ -0,0 +1,22 @@
package client
import (
"crazyfs/CacheItem"
"crazyfs/config"
"encoding/json"
lru "github.com/hashicorp/golang-lru/v2"
"net/http"
)
func RestrictedDownloadDirectories(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
response := map[string]interface{}{
"restricted_download_directories": config.RestrictedDownloadPaths,
}
w.Header().Set("Content-Type", "application/json")
err := json.NewEncoder(w).Encode(response)
if err != nil {
log.Errorf("AdminCacheInfo - Failed to serialize JSON: %s", err)
return
}
}

12
src/api/client/init.go Normal file
View File

@ -0,0 +1,12 @@
package client
import (
"crazyfs/logging"
"github.com/sirupsen/logrus"
)
var log *logrus.Logger
func init() {
log = logging.GetLogger()
}

View File

@ -0,0 +1,12 @@
package helpers
import (
"crazyfs/logging"
"github.com/sirupsen/logrus"
)
var log *logrus.Logger
func init() {
log = logging.GetLogger()
}

View File

@ -1,29 +1,44 @@
package helpers package helpers
import ( import (
"crazyfs/logging"
"encoding/json" "encoding/json"
"net/http" "net/http"
) )
func Return400Msg(msg string, w http.ResponseWriter) { func WriteErrorResponse(json_code, http_code int, msg string, w http.ResponseWriter) {
//log := logging.GetLogger()
//log.Warnln(msg)
w.Header().Set("Cache-Control", "no-store") w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest) w.WriteHeader(http_code)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": http.StatusBadRequest, err := json.NewEncoder(w).Encode(map[string]interface{}{
"code": json_code,
"error": msg, "error": msg,
}) })
if err != nil {
log.Errorln("HELPERS - WriteErrorResponse failed to encode JSON response: ", err)
}
}
func ReturnFake404Msg(msg string, w http.ResponseWriter) {
WriteErrorResponse(404, http.StatusBadRequest, msg, w)
}
func Return400Msg(msg string, w http.ResponseWriter) {
WriteErrorResponse(http.StatusBadRequest, http.StatusBadRequest, msg, w)
} }
func HandleRejectDuringInitialCrawl(w http.ResponseWriter) { func HandleRejectDuringInitialCrawl(w http.ResponseWriter) {
log := logging.GetLogger() WriteErrorResponse(http.StatusServiceUnavailable, http.StatusServiceUnavailable, "initial file system crawl in progress", w)
log.Warnln("Rejecting request during initial crawl") }
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json") func Return500Msg(w http.ResponseWriter) {
w.WriteHeader(http.StatusServiceUnavailable) WriteErrorResponse(http.StatusInternalServerError, http.StatusInternalServerError, "internal server error", w)
json.NewEncoder(w).Encode(map[string]interface{}{ }
"code": http.StatusServiceUnavailable,
"error": "initial file system crawl in progress", func Return403Msg(msg string, w http.ResponseWriter) {
}) WriteErrorResponse(http.StatusForbidden, http.StatusForbidden, msg, w)
} }

View File

@ -1,36 +1,65 @@
package helpers package helpers
import ( import (
"crazyfs/CacheItem"
"crazyfs/cache" "crazyfs/cache"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data"
"crazyfs/logging" "crazyfs/logging"
"encoding/json"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"net/http" "net/http"
"os" "os"
"strconv" "strconv"
"time"
) )
func HandleFileNotFound(relPath string, fullPath string, sharedCache *lru.Cache[string, *data.Item], cfg *config.Config, w http.ResponseWriter) *data.Item { // HandleFileNotFound if the data is not in the cache, start a new crawler
func HandleFileNotFound(relPath string, fullPath string, sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config, w http.ResponseWriter) *CacheItem.Item {
log := logging.GetLogger() log := logging.GetLogger()
// If the data is not in the cache, start a new crawler
//log.Fatalf("CRAWLER - %s not in cache, crawling", fullPath)
log.Debugf("CRAWLER - %s not in cache, crawling", fullPath) log.Debugf("CRAWLER - %s not in cache, crawling", fullPath)
pool := cache.NewWorkerPool() dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
crawler := cache.NewDirectoryCrawler(sharedCache, pool)
// Check if this is a symlink. We do this before CrawlNoRecursion() because we want to tell the end user that
// we're not going to resolve this symlink.
//info, err := os.Lstat(fullPath)
//if err != nil {
// log.Errorf("HandleFileNotFound - os.Lstat failed: %s", err)
// Return500Msg(w)
// return nil
//}
//if !config.FollowSymlinks && info.Mode()&os.ModeSymlink > 0 {
// Return400Msg("path is a symlink", w)
// return nil
//}
// We don't want to traverse the entire directory tree since we'll only return the current directory anyways // We don't want to traverse the entire directory tree since we'll only return the current directory anyways
err := crawler.Crawl(fullPath, false) item, err := dc.CrawlNoRecursion(fullPath)
if err != nil {
log.Errorf("LIST - crawl failed: %s", err) if os.IsNotExist(err) || item == nil {
w.Header().Set("Content-Type", "application/json") ReturnFake404Msg("path not found", w)
w.WriteHeader(http.StatusInternalServerError) return nil
json.NewEncoder(w).Encode(map[string]interface{}{ } else if err != nil {
"code": 500, log.Errorf("HandleFileNotFound - crawl failed: %s", err)
"error": "internal server error", Return500Msg(w)
})
return nil return nil
} }
// Start a recursive crawl in the background.
// We've already gotten our cached CacheItem (may be null if it doesn't exist) so this won't affect our results
go func() {
log.Debugf("Starting background recursive crawl for %s", fullPath)
dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
start := time.Now()
err := dc.Crawl(fullPath, true)
if err != nil {
log.Errorf("LIST - background recursive crawl failed: %s", err)
}
log.Debugf("Finished background recursive crawl for %s, elapsed time: %s", fullPath, time.Since(start).Round(time.Second))
}()
// Try to get the data from the cache again // Try to get the data from the cache again
item, found := sharedCache.Get(relPath) item, found := sharedCache.Get(relPath)
if !found { if !found {
@ -39,42 +68,27 @@ func HandleFileNotFound(relPath string, fullPath string, sharedCache *lru.Cache[
if _, err := os.Stat(fullPath); os.IsNotExist(err) { if _, err := os.Stat(fullPath); os.IsNotExist(err) {
log.Debugf("File not in cache: %s", fullPath) log.Debugf("File not in cache: %s", fullPath)
// If the file or directory does not exist, return a 404 status code and a message // If the file or directory does not exist, return a 404 status code and a message
w.Header().Set("Content-Type", "application/json") ReturnFake404Msg("file or directory not found", w)
w.WriteHeader(http.StatusNotFound)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 400,
"error": "file or directory not found",
})
return nil return nil
} else if err != nil { } else if err != nil {
// If there was an error checking if the file or directory exists, return a 500 status code and the error // If there was an error checking if the file or directory exists, return a 500 status code and the error
log.Errorf("LIST - %s", err.Error()) log.Errorf("LIST - %s", err.Error())
w.Header().Set("Content-Type", "application/json") Return500Msg(w)
w.WriteHeader(http.StatusInternalServerError)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 500,
"error": "internal server error",
})
return nil return nil
} }
} }
// If item is still nil, error // If CacheItem is still nil, error
if item == nil { if item == nil {
log.Errorf("LIST - crawler failed to find %s and did not return a 404", relPath) log.Errorf("LIST - crawler failed to find %s and did not return a 404", relPath)
w.Header().Set("Content-Type", "application/json") Return500Msg(w)
w.WriteHeader(http.StatusInternalServerError)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 500,
"error": "crawler failed to fetch file or directory",
})
return nil return nil
} }
cache.CheckAndRecache(fullPath, cfg, sharedCache) cache.CheckAndRecache(fullPath, cfg, sharedCache)
return item return item
} }
func IsPositiveInt(testStr string) bool { func IsNonNegativeInt(testStr string) bool {
if num, err := strconv.ParseInt(testStr, 10, 64); err == nil { if num, err := strconv.ParseInt(testStr, 10, 64); err == nil {
return num >= 0 return num >= 0
} }
@ -94,3 +108,19 @@ func Max(a, b int) int {
} }
return b return b
} }
func CheckInitialCrawl() bool {
return cache.InitialCrawlInProgress && !config.HttpAllowDuringInitialCrawl
}
func CheckPathRestricted(relPath string) bool {
for _, restrictedPath := range config.RestrictedDownloadPaths {
if restrictedPath == "" {
restrictedPath = "/"
}
if relPath == restrictedPath {
return true
}
}
return false
}

View File

@ -0,0 +1,126 @@
package helpers
import (
"crazyfs/CacheItem"
"crazyfs/config"
"crazyfs/file"
lru "github.com/hashicorp/golang-lru/v2"
kzip "github.com/klauspost/compress/zip"
"io"
"net/http"
"os"
"path/filepath"
)
func ZipHandlerCompress(dirPath string, w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/zip")
//w.WriteHeader(http.StatusOK)
zipWriter := kzip.NewWriter(w)
// Walk through the directory and add each file to the zip
filepath.Walk(dirPath, func(filePath string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}
// Ensure the file path is relative to the directory being zipped
relativePath, err := filepath.Rel(dirPath, filePath)
if err != nil {
return err
}
writer, err := zipWriter.Create(relativePath)
if err != nil {
return err
}
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(writer, file)
return err
})
err := zipWriter.Close()
if err != nil {
log.Errorf("ZIPSTREM - failed to close zipwriter: %s", err)
}
}
func ZipHandlerCompressMultiple(paths []string, w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
zipWriter := kzip.NewWriter(w)
// Walk through each file and add it to the zip
for _, fullPath := range paths {
relPath := file.StripRootDir(fullPath)
// Try to get the data from the cache
item, found := sharedCache.Get(relPath)
if !found {
item = HandleFileNotFound(relPath, fullPath, sharedCache, cfg, w)
}
if item == nil {
// The errors have already been handled in handleFileNotFound() so we're good to just exit
return
}
if !item.IsDir {
writer, err := zipWriter.Create(relPath)
if err != nil {
Return500Msg(w)
return
}
file, err := os.Open(fullPath)
if err != nil {
Return500Msg(w)
return
}
defer file.Close()
_, err = io.Copy(writer, file)
if err != nil {
Return500Msg(w)
return
}
} else {
w.Header().Set("Content-Disposition", `attachment; filename="files.zip"`)
w.Header().Set("Content-Type", "application/zip")
//w.WriteHeader(http.StatusOK)
// If it's a directory, walk through it and add each file to the zip
filepath.Walk(fullPath, func(filePath string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}
// Ensure the file path is relative to the directory being zipped
relativePath, err := filepath.Rel(fullPath, filePath)
if err != nil {
return err
}
writer, err := zipWriter.Create(filepath.Join(relPath, relativePath))
if err != nil {
return err
}
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(writer, file)
return err
})
}
}
err := zipWriter.Close()
if err != nil {
log.Errorf("ZIPSTREM - failed to close zipwriter: %s", err)
return
}
}

12
src/api/init.go Normal file
View File

@ -0,0 +1,12 @@
package api
import (
"crazyfs/logging"
"github.com/sirupsen/logrus"
)
var log *logrus.Logger
func init() {
log = logging.GetLogger()
}

View File

@ -1,222 +0,0 @@
package api
import (
"bytes"
"crazyfs/api/helpers"
"crazyfs/cache"
"crazyfs/config"
"crazyfs/data"
"crazyfs/logging"
"encoding/gob"
"encoding/json"
lru "github.com/hashicorp/golang-lru/v2"
"math"
"net/http"
"path/filepath"
"sort"
"strconv"
"strings"
)
func ListDir(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) {
if cache.InitialCrawlInProgress && !cfg.HttpAllowDuringInitialCrawl {
helpers.HandleRejectDuringInitialCrawl(w)
return
}
log := logging.GetLogger()
pathArg := r.URL.Query().Get("path")
sortArg := r.URL.Query().Get("sort")
var folderSorting string
if sortArg == "default" || sortArg == "" {
folderSorting = "default"
} else if sortArg == "folders" {
folderSorting = "folders"
} else {
helpers.Return400Msg("folders arg must be 'default' (to not do any sorting) or 'first' (to sort the folders to the front of the list)", w)
return
}
// Clean the path to prevent directory traversal
// filepath.Clean() below will do most of the work but these are just a few checks
// Also this will break the cache because it will create another entry for the relative path
if strings.Contains(pathArg, "/../") || strings.HasPrefix(pathArg, "../") || strings.HasSuffix(pathArg, "/..") || strings.HasPrefix(pathArg, "~") {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": http.StatusBadRequest,
"error": "invalid file path",
})
return
}
fullPath := filepath.Join(cfg.RootDir, filepath.Clean("/"+pathArg))
relPath := cache.StripRootDir(fullPath, cfg.RootDir)
// Try to get the data from the cache
cacheItem, found := sharedCache.Get(relPath)
if !found {
cacheItem = helpers.HandleFileNotFound(relPath, fullPath, sharedCache, cfg, w)
// Start a recursive crawl in the background.
// We've already gotten our cached item (may be null if it doesn't exist) so this won't affect our results
go func() {
log.Debugf("LIST - starting background recursive crawl for %s", fullPath)
pool := cache.NewWorkerPool()
crawler := cache.NewDirectoryCrawler(sharedCache, pool)
err := crawler.Crawl(fullPath, true)
if err != nil {
log.Errorf("LIST - background recursive crawl failed: %s", err)
}
}()
}
if cacheItem == nil {
return // The errors have already been handled in handleFileNotFound() so we're good to just exit
}
// Create a deep copy of the cached item so we don't modify the item in the cache
var buf bytes.Buffer
enc := gob.NewEncoder(&buf)
dec := gob.NewDecoder(&buf)
err := enc.Encode(cacheItem)
if err != nil {
log.Errorf("Error encoding item: %v", err)
return
}
var item data.Item
err = dec.Decode(&item)
if err != nil {
log.Errorf("Error decoding item: %v", err)
return
}
// Get the MIME type of the file if the 'mime' argument is present
mime := r.URL.Query().Get("mime")
if mime != "" {
if item.IsDir && !cfg.HttpAllowDirMimeParse {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusForbidden)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 403,
"error": "not allowed to analyze the mime of directories",
})
return
} else if !item.IsDir {
// Only update the mime in the cache if it hasn't been set already.
// TODO: need to make sure that when a re-crawl is triggered, the Type is set back to nil
if item.Type == nil {
fileExists, mimeType, ext, err := cache.GetFileMime(fullPath, true)
if !fileExists {
helpers.Return400Msg("file not found", w)
}
if err != nil {
log.Warnf("Error detecting MIME type: %v", err)
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusInternalServerError)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 500,
"error": "internal server error",
})
return
}
// Update the original cached item's MIME in the sharedCache
cacheItem.Type = &mimeType
cacheItem.Extension = &ext
sharedCache.Add(relPath, cacheItem) // take the address of item
}
}
}
response := map[string]interface{}{}
// Pagination
var paginationLimit int
if r.URL.Query().Get("limit") != "" {
if !helpers.IsPositiveInt(r.URL.Query().Get("limit")) {
helpers.Return400Msg("limit must be positive number", w)
return
}
i, _ := strconv.ParseInt(r.URL.Query().Get("limit"), 10, 32)
paginationLimit = int(i)
} else {
paginationLimit = 100
}
totalPages := math.Ceil(float64(len(item.Children)) / float64(paginationLimit))
if r.URL.Query().Get("page") != "" {
response["total_pages"] = int(totalPages)
}
if folderSorting == "folders" {
sort.Slice(item.Children, func(i, j int) bool {
return item.Children[i].IsDir && !item.Children[j].IsDir
})
}
// Set the children to an empty array so that the JSON encoder doesn't return it as nil
var paginatedChildren []*data.Item // this var is either the full item list or a paginated list depending on the query args
if item.Children != nil {
paginatedChildren = item.Children
} else {
paginatedChildren = make([]*data.Item, 0)
}
pageParam := r.URL.Query().Get("page")
if pageParam != "" {
page, err := strconv.Atoi(pageParam)
if err != nil || page < 1 || page > int(totalPages) {
//w.Header().Set("Content-Type", "application/json")
//w.WriteHeader(http.StatusBadRequest)
//json.NewEncoder(w).Encode(map[string]interface{}{
// "code": http.StatusBadRequest,
// "error": "invalid page number",
// "total_pages": int(totalPages),
//})
//return
// Don't return an error, just trunucate things
page = int(totalPages)
}
start := (page - 1) * paginationLimit
end := start + paginationLimit
if start >= 0 { // avoid segfaults
if start > len(item.Children) {
start = len(item.Children)
}
if end > len(item.Children) {
end = len(item.Children)
}
paginatedChildren = paginatedChildren[start:end]
}
}
// TODO: don't use depriciated file read methods
//if cfg.HttpAPIListCacheControl > 0 {
// w.Header().Set("Cache-Control", fmt.Sprintf("public, max-age=%d, must-revalidate", cfg.HttpAPIListCacheControl))
//} else {
w.Header().Set("Cache-Control", "no-store")
//}
for i := range paginatedChildren {
paginatedChildren[i].Children = nil
}
response["item"] = map[string]interface{}{
"path": item.Path,
"name": item.Name,
"size": item.Size,
"extension": item.Extension,
"modified": item.Modified,
"mode": item.Mode,
"isDir": item.IsDir,
"isSymlink": item.IsSymlink,
"cached": item.Cached,
"children": paginatedChildren,
"type": item.Type,
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(response)
}

View File

@ -1,8 +1,9 @@
package api package api
import ( import (
"crazyfs/CacheItem"
"crazyfs/api/client"
"crazyfs/config" "crazyfs/config"
"crazyfs/data"
"crazyfs/logging" "crazyfs/logging"
"encoding/json" "encoding/json"
"fmt" "fmt"
@ -20,7 +21,7 @@ type Route struct {
type Routes []Route type Routes []Route
type AppHandler func(http.ResponseWriter, *http.Request, *config.Config, *lru.Cache[string, *data.Item]) type AppHandler func(http.ResponseWriter, *http.Request, *config.Config, *lru.Cache[string, *CacheItem.Item])
var routes = Routes{ var routes = Routes{
Route{ Route{
@ -56,13 +57,13 @@ var routes = Routes{
Route{ Route{
"Trigger Recache", "Trigger Recache",
"POST", "POST",
"/api/admin/recache", "/api/admin/cache/recache",
AdminReCache, AdminReCache,
}, },
Route{ Route{
"Trigger Recache", "Trigger Recache",
"GET", "GET",
"/api/admin/recache", "/api/admin/cache/recache",
wrongMethod("POST", AdminReCache), wrongMethod("POST", AdminReCache),
}, },
Route{ Route{
@ -71,6 +72,27 @@ var routes = Routes{
"/api/health", "/api/health",
HealthCheck, HealthCheck,
}, },
// TODO: remove
Route{
"Server Health",
"GET",
"/api/health",
HealthCheck,
},
Route{
"Server Health",
"GET",
"/api/client/health",
client.ClientHealthCheck,
},
Route{
"Restricted Directories",
"GET",
"/api/client/restricted",
client.RestrictedDownloadDirectories,
},
} }
func setHeaders(next http.Handler) http.Handler { func setHeaders(next http.Handler) http.Handler {
@ -82,7 +104,7 @@ func setHeaders(next http.Handler) http.Handler {
}) })
} }
func NewRouter(cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) *mux.Router { func NewRouter(cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) *mux.Router {
r := mux.NewRouter().StrictSlash(true) r := mux.NewRouter().StrictSlash(true)
for _, route := range routes { for _, route := range routes {
var handler http.Handler var handler http.Handler
@ -117,7 +139,7 @@ func NewRouter(cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) *
} }
func wrongMethod(expectedMethod string, next AppHandler) AppHandler { func wrongMethod(expectedMethod string, next AppHandler) AppHandler {
return func(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { return func(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
w.Header().Set("Content-Type", "application/json") w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest) w.WriteHeader(http.StatusBadRequest)
json.NewEncoder(w).Encode(map[string]interface{}{ json.NewEncoder(w).Encode(map[string]interface{}{

View File

@ -1,257 +0,0 @@
package api
import (
"bytes"
"crazyfs/api/helpers"
"crazyfs/cache"
"crazyfs/config"
"crazyfs/data"
"crazyfs/file"
"crazyfs/logging"
"encoding/json"
"github.com/disintegration/imaging"
lru "github.com/hashicorp/golang-lru/v2"
"image"
"image/color"
"image/png"
"net/http"
"path/filepath"
"strconv"
"strings"
"github.com/nfnt/resize"
)
func Thumbnail(w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) {
if cache.InitialCrawlInProgress && !cfg.HttpAllowDuringInitialCrawl {
helpers.HandleRejectDuringInitialCrawl(w)
returnDummyPNG(w)
return
}
log := logging.GetLogger()
relPath := cache.StripRootDir(filepath.Join(cfg.RootDir, r.URL.Query().Get("path")), cfg.RootDir)
relPath = strings.TrimSuffix(relPath, "/")
fullPath := filepath.Join(cfg.RootDir, relPath)
// Validate args before doing any operations
widthStr := r.URL.Query().Get("width")
if widthStr != "" {
if !helpers.IsPositiveInt(widthStr) {
helpers.Return400Msg("height and width must both be positive numbers", w)
return
}
}
heightStr := r.URL.Query().Get("height")
if heightStr != "" {
if !helpers.IsPositiveInt(heightStr) {
helpers.Return400Msg("height and width must both be positive numbers", w)
return
}
}
pngQualityStr := r.URL.Query().Get("quality")
var pngQuality int
if pngQualityStr != "" {
if !helpers.IsPositiveInt(pngQualityStr) {
helpers.Return400Msg("quality must be a positive number", w)
return
}
pngQuality64, _ := strconv.ParseInt(pngQualityStr, 10, 32)
pngQuality = int(pngQuality64)
} else {
pngQuality = 50
}
scaleStr := r.URL.Query().Get("auto")
var autoScale bool
if scaleStr != "" {
autoScale = true
}
squareStr := r.URL.Query().Get("square")
var square bool
if squareStr != "" {
square = true
}
// Try to get the data from the cache
item, found := sharedCache.Get(relPath)
if !found {
item = helpers.HandleFileNotFound(relPath, fullPath, sharedCache, cfg, w)
}
if item == nil {
return
}
if item.IsDir {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": http.StatusBadRequest,
"error": "that's a directory",
})
return
}
// Get the MIME type of the file
fileExists, mimeType, ext, err := cache.GetFileMime(fullPath, true)
if !fileExists {
helpers.Return400Msg("file not found", w)
}
if err != nil {
log.Errorf("THUMB = error detecting MIME type: %v", err)
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusInternalServerError)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": 500,
"error": "internal server error",
})
return
}
// Update the item's MIME in the sharedCache
item.Type = &mimeType
item.Extension = &ext
sharedCache.Add(relPath, item)
// Check if the file is an image
if !strings.HasPrefix(mimeType, "image/") {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusBadRequest)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": http.StatusBadRequest,
"error": "file is not an image",
})
return
}
// Convert the image to a PNG
imageBytes, err := file.ConvertToPNG(fullPath, mimeType)
if err != nil {
log.Warnf("Error converting %s to PNG: %v", fullPath, err)
returnDummyPNG(w)
return
}
// Decode the image
var img image.Image
img, err = png.Decode(bytes.NewReader(imageBytes))
if err != nil {
log.Warnf("Error decoding %s image data: %v", fullPath, err)
returnDummyPNG(w)
return
}
// Resize the image
var width, height uint
if widthStr != "" {
width64, _ := strconv.ParseUint(widthStr, 10, 32)
width = uint(width64)
}
if heightStr != "" {
height64, _ := strconv.ParseUint(heightStr, 10, 32)
height = uint(height64)
}
if square {
var size int
if width == 0 && height == 0 {
size = 300
} else if (width != 0 && height != 0) && (width != height) {
helpers.Return400Msg("width and height must be equal in square mode, or only one provided", w)
return
} else if width != 0 {
size = int(width)
} else {
size = int(height)
}
if size > img.Bounds().Dx() || size > img.Bounds().Dy() {
size = helpers.Max(img.Bounds().Dx(), img.Bounds().Dy())
}
// First, make the image square by scaling the smallest dimension to the larget size
if img.Bounds().Dx() > img.Bounds().Dy() {
width = 0
height = uint(size)
} else {
width = uint(size)
height = 0
}
resized := resize.Resize(width, height, img, resize.Lanczos3)
// Then crop the image to the target size
img = imaging.CropCenter(resized, size, size)
} else {
if width == 0 && height == 0 {
if autoScale {
// If both width and height parameters are not provided, set
// the largest dimension to 300 and scale the other.
if img.Bounds().Dx() > img.Bounds().Dy() {
width = 300
height = 0
} else {
width = 0
height = 300
}
} else {
// Don't auto-resize because this endpoint can also be used for simply reducing the quality of an image
width = uint(img.Bounds().Dx())
height = uint(img.Bounds().Dy())
}
} else if width == 0 {
// If only width is provided, calculate the height based on the image's aspect ratio
width = uint(img.Bounds().Dx()) * height / uint(img.Bounds().Dy())
} else if height == 0 {
height = uint(img.Bounds().Dy()) * width / uint(img.Bounds().Dx())
}
// Scale the image. If the image is smaller than the provided height or width, it won't be resized.
img = resize.Resize(width, height, img, resize.Lanczos3)
}
// Encode the image
//buf := new(bytes.Buffer)
//if err := png.Encode(buf, img); err != nil {
// log.Warnf("Error encoding %s to PNG: %v", fullPath, err)
// returnDummyPNG(w)
// //w.Header().Set("Content-Type", "application/json")
// //w.WriteHeader(http.StatusInternalServerError)
// //json.NewEncoder(w).Encode(map[string]interface{}{
// // "code": 500,
// // "error": "500 internal server error",
// //})
// return
//}
buf, err := file.CompressPNGFile(img, pngQuality)
if err != nil {
log.Warnf("Error compressing %s to PNG: %v", fullPath, err)
returnDummyPNG(w)
}
// Return the image
w.Header().Set("Content-Type", "image/png")
w.Write(buf.Bytes())
}
func returnDummyPNG(w http.ResponseWriter) {
img := image.NewRGBA(image.Rect(0, 0, 300, 300))
blue := color.RGBA{255, 255, 255, 255}
for y := 0; y < img.Bounds().Dy(); y++ {
for x := 0; x < img.Bounds().Dx(); x++ {
img.Set(x, y, blue)
}
}
buffer := new(bytes.Buffer)
if err := png.Encode(buffer, img); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
// TODO: set cache-control
w.Header().Set("Content-Type", "image/png")
w.Write(buffer.Bytes())
}

View File

@ -1,140 +0,0 @@
package cache
import (
"crazyfs/data"
lru "github.com/hashicorp/golang-lru/v2"
"os"
"path/filepath"
"strings"
"sync"
)
// Config values
var FollowSymlinks bool
var WorkerBufferSize int
var PrintNew bool
var RootDir string
var CrawlerParseMIME bool
type DirectoryCrawler struct {
cache *lru.Cache[string, *data.Item]
pool *WorkerPool
}
func NewDirectoryCrawler(cache *lru.Cache[string, *data.Item], pool *WorkerPool) *DirectoryCrawler {
return &DirectoryCrawler{cache: cache, pool: pool}
}
func (dc *DirectoryCrawler) Crawl(path string, recursive bool) error {
info, err := os.Stat(path)
if os.IsNotExist(err) {
// If the path doesn't exist, just silently exit
return nil
}
if err != nil {
return err
}
// Get a list of all keys in the cache that belong to this directory
keys := make([]string, 0)
for _, key := range dc.cache.Keys() {
if strings.HasPrefix(key, path) {
keys = append(keys, key)
}
}
// Remove all entries in the cache that belong to this directory so we can start fresh
for _, key := range keys {
dc.cache.Remove(key)
}
if info.IsDir() {
// If the path is a directory, walk the directory
var wg sync.WaitGroup
err := dc.walkDir(path, &wg, info, recursive)
if err != nil {
log.Errorf("CRAWLER - dc.walkDir() in Crawl() returned error: %s", err)
}
} else {
// If the path is a file, add it to the cache directly
dc.cache.Add(StripRootDir(path, RootDir), NewItem(path, info))
}
return nil
}
func (dc *DirectoryCrawler) walkDir(dir string, n *sync.WaitGroup, dirInfo os.FileInfo, recursive bool) error {
// We are handling errors for each file or directory individually. Does this slow things down?
entries, err := os.ReadDir(dir)
if err != nil {
log.Errorf("CRAWLER - walkDir() failed to read directory %s: %s", dir, err)
return err
}
// Create the directory item but don't add it to the cache yet
dirItem := NewItem(dir, dirInfo)
for _, entry := range entries {
subpath := filepath.Join(dir, entry.Name())
info, err := os.Lstat(subpath) // update the info var with the new entry
if err != nil {
log.Warnf("CRAWLER - walkDir() failed to stat subpath %s: %s", subpath, err)
continue
}
if FollowSymlinks && info.Mode()&os.ModeSymlink != 0 {
link, err := os.Readlink(subpath)
if err != nil {
log.Warnf("CRAWLER - walkDir() failed to read symlink %s: %s", subpath, err)
continue
}
info, err = os.Stat(link)
if err != nil {
log.Warnf("CRAWLER - walkDir() failed to stat link %s: %s", link, err)
continue
}
}
if entry.IsDir() && recursive {
n.Add(1)
go func() {
defer n.Done() // Move Done() here
err := dc.walkDir(subpath, n, info, recursive)
if err != nil {
log.Errorf("CRAWLER - dc.walkDir() in walkDir() -> IsDir() returned error: %s", err)
}
}()
} else {
w := dc.pool.Get()
w.add(subpath)
dc.pool.Put(w)
}
// Add the entry to the directory's contents
entryItem := NewItem(subpath, info)
dirItem.Children = append(dirItem.Children, entryItem)
}
// Add the directory to the cache after all of its children have been processed
dc.cache.Add(StripRootDir(dir, RootDir), dirItem)
// If the directory is not the root directory, update the parent directory's Children field
if dir != RootDir {
parentDir := filepath.Dir(dir)
parentItem, found := dc.cache.Get(StripRootDir(parentDir, RootDir))
if found {
// Remove the old version of the directory from the parent's Children field
for i, child := range parentItem.Children {
if child.Path == StripRootDir(dir, RootDir) {
parentItem.Children = append(parentItem.Children[:i], parentItem.Children[i+1:]...)
break
}
}
// Add the new version of the directory to the parent's Children field
parentItem.Children = append(parentItem.Children, dirItem)
// Update the parent directory in the cache
dc.cache.Add(StripRootDir(parentDir, RootDir), parentItem)
}
}
return nil
}

View File

@ -0,0 +1,89 @@
package DirectoryCrawler
import (
"crazyfs/CacheItem"
"crazyfs/file"
lru "github.com/hashicorp/golang-lru/v2"
"os"
"path/filepath"
"strings"
"sync"
"sync/atomic"
)
var globalActiveCrawls int32
type DirectoryCrawler struct {
cache *lru.Cache[string, *CacheItem.Item]
visited sync.Map
wg sync.WaitGroup
mu sync.Mutex // lock for the visted map
}
func NewDirectoryCrawler(cache *lru.Cache[string, *CacheItem.Item]) *DirectoryCrawler {
return &DirectoryCrawler{
cache: cache,
visited: sync.Map{},
}
}
func (dc *DirectoryCrawler) Get(path string) (*CacheItem.Item, bool) {
return dc.cache.Get(path)
}
func (dc *DirectoryCrawler) CleanupDeletedFiles(path string) {
dc.visited.Range(func(key, value interface{}) bool {
keyStr := key.(string)
if isSubpath(file.StripRootDir(path), keyStr) && value.(bool) {
dc.cache.Remove(keyStr)
}
return true
})
}
func (dc *DirectoryCrawler) AddCacheItem(fullPath string, info os.FileInfo) {
strippedPath := file.StripRootDir(fullPath)
item := CacheItem.NewItem(fullPath, info)
if item != nil {
// Sometimes CacheItem.NewItem will return nil if the path fails its checks
dc.cache.Add(strippedPath, item)
} else {
//log.Errorf("NewItem returned nil for %s", fullPath)
}
}
func isSubpath(path, subpath string) bool {
// Clean the paths to remove any redundant or relative elements
path = filepath.Clean(path)
subpath = filepath.Clean(subpath)
// Split the paths into their components
pathParts := strings.Split(path, string(os.PathSeparator))
subpathParts := strings.Split(subpath, string(os.PathSeparator))
// If the subpath has more components than the path, it cannot be a subpath
if len(subpathParts) < len(pathParts) {
return false
}
// Compare the components of the path and the subpath
for i, part := range pathParts {
if part != subpathParts[i] {
return false
}
}
return true
}
func (dc *DirectoryCrawler) incrementGlobalActiveCrawls() {
atomic.AddInt32(&globalActiveCrawls, 1)
}
func (dc *DirectoryCrawler) decrementGlobalActiveCrawls() {
atomic.AddInt32(&globalActiveCrawls, -1)
}
func GetGlobalActiveCrawls() int32 {
return atomic.LoadInt32(&globalActiveCrawls)
}

205
src/cache/DirectoryCrawler/Walker.go vendored Normal file
View File

@ -0,0 +1,205 @@
package DirectoryCrawler
import (
"errors"
"fmt"
"os"
"path/filepath"
"sync"
"sync/atomic"
)
var JobQueueSize int
// WorkerPool is a buffered channel acting as a semaphore to limit the number of active workers globally
var WorkerPool chan struct{}
// ActiveWorkers is an atomic counter for the number of active workers
var ActiveWorkers int32
// ActiveWalks is an atomic counter for the number of active Walk crawls
var ActiveWalks int32
// ErrNotDir indicates that the path, which is being passed
// to a walker function, does not point to a directory
var ErrNotDir = errors.New("not a directory")
// Walker is constructed for each Walk() function invocation
type Walker struct {
wg sync.WaitGroup
jobs chan string
root string
followSymlinks bool
walkFunc filepath.WalkFunc
}
// the readDirNames function below was taken from the original
// implementation (see https://golang.org/src/path/filepath/path.go)
// but has sorting removed (sorting doesn't make sense
// in concurrent execution, anyway)
// readDirNames reads the directory named by dirname and returns
// a list of directory entries.
func readDirNames(dirname string) ([]string, error) {
f, err := os.Open(dirname)
if err != nil {
return nil, err
}
defer func() {
cerr := f.Close()
if err == nil {
err = cerr
}
}()
names, err := f.Readdirnames(-1)
if err != nil {
return nil, err
}
return names, nil
}
// lstat is a wrapper for os.Lstat which accepts a path
// relative to Walker.root and also follows symlinks
func (w *Walker) lstat(relpath string) (info os.FileInfo, err error) {
path := filepath.Join(w.root, relpath)
info, err = os.Lstat(path)
if err != nil {
return nil, err
}
// check if this is a symlink
if w.followSymlinks {
if info.Mode()&os.ModeSymlink > 0 {
path, err = filepath.EvalSymlinks(path)
if err != nil {
return nil, err
}
info, err = os.Lstat(path)
if err != nil {
return nil, err
}
}
}
return
}
// processPath processes one directory and adds
// its subdirectories to the queue for further processing
func (w *Walker) processPath(relpath string) error {
defer w.wg.Done()
path := filepath.Join(w.root, relpath)
names, err := readDirNames(path)
if err != nil {
log.Errorf("Walker - processPath - readDirNames - %s", err)
return err
}
for _, name := range names {
subpath := filepath.Join(relpath, name)
info, err := w.lstat(subpath)
if err != nil {
log.Warnf("processPath - %s - %s", relpath, err)
continue
}
if info == nil {
log.Warnf("processPath - %s - %s", relpath, err)
continue
}
w.walkFunc(filepath.Join(w.root, subpath), info, err)
//if err == filepath.SkipDir {
// return nil
//}
if info.Mode().IsDir() {
w.addJob(subpath)
}
}
return nil
}
// addJob increments the job counter
// and pushes the path to the jobs channel
func (w *Walker) addJob(path string) {
w.wg.Add(1)
select {
// try to push the job to the channel
case w.jobs <- path: // ok
default: // buffer overflow
// process job synchronously
err := w.processPath(path)
if err != nil {
log.Warnf("addJob - %s - %s", path, err)
}
}
}
// worker processes all the jobs
// until the jobs channel is explicitly closed
func (w *Walker) worker() {
for path := range w.jobs {
WorkerPool <- struct{}{} // acquire a worker
atomic.AddInt32(&ActiveWorkers, 1) // increment the number of active workers
err := w.processPath(path)
if err != nil {
log.Warnf("worker - %s", err)
}
<-WorkerPool // release the worker when done
atomic.AddInt32(&ActiveWorkers, -1) // decrement the number of active workers
}
}
// Walk recursively descends into subdirectories,
// calling walkFn for each file or directory
// in the tree, including the root directory.
func (w *Walker) Walk(relpath string, walkFn filepath.WalkFunc) error {
atomic.AddInt32(&ActiveWalks, 1) // increment the number of active Walk crawls
defer atomic.AddInt32(&ActiveWalks, -1) // decrement the number of active Walk crawls when done
w.jobs = make(chan string, JobQueueSize)
w.walkFunc = walkFn
info, err := w.lstat(relpath)
err = w.walkFunc(filepath.Join(w.root, relpath), info, err)
if err == filepath.SkipDir {
return nil
}
if err != nil {
return err
}
if info == nil {
return fmt.Errorf("broken symlink: %s", relpath)
}
if !info.Mode().IsDir() {
return ErrNotDir
}
// spawn workers
for n := 1; n <= JobQueueSize; n++ {
go w.worker()
}
w.addJob(relpath) // add this path as a first job
w.wg.Wait() // wait till all paths are processed
close(w.jobs) // signal workers to close
return nil
}
// Walk is a wrapper function for the Walker object
// that mimics the behavior of filepath.Walk,
// and doesn't follow symlinks.
func Walk(root string, followSymlinks bool, walkFn filepath.WalkFunc) error {
w := Walker{
root: root,
followSymlinks: followSymlinks,
}
return w.Walk("", walkFn)
}

134
src/cache/DirectoryCrawler/crawl.go vendored Normal file
View File

@ -0,0 +1,134 @@
package DirectoryCrawler
import (
"crazyfs/CacheItem"
"crazyfs/config"
"crazyfs/file"
"os"
"path/filepath"
)
func (dc *DirectoryCrawler) walkRecursiveFunc(path string, info os.FileInfo, err error) error {
dc.processPath(path, info)
return nil
}
func (dc *DirectoryCrawler) walkNonRecursiveFunc(path string, dir os.DirEntry, err error) error {
info, err := dir.Info()
if err != nil {
log.Errorf("CRAWLER - walkNonRecursiveFunc() - get info failed - %s", err)
return err
}
dc.processPath(path, info)
return nil
}
func (dc *DirectoryCrawler) Crawl(fullPath string, shouldBlock bool) error {
info, err := os.Lstat(fullPath)
if os.IsNotExist(err) {
// If the path doesn't exist, just silently exit
return err
}
if err != nil {
log.Errorf("CRAWLER - Crawl() - os.Lstat() failed - %s", err)
return err
}
//if !config.FollowSymlinks && info.Mode()&os.ModeSymlink > 0 {
// msg := fmt.Sprintf("CRAWL - tried to crawl a symlink (not allowed in config): %s", fullPath)
// log.Warnf(msg)
// return errors.New(msg)
//}
relPath := file.StripRootDir(fullPath)
dc.cache.Remove(relPath)
if info.IsDir() {
// Get a list of all keys in the cache that belong to this directory
keys := make([]string, 0)
for _, key := range dc.cache.Keys() {
if isSubpath(fullPath, key) {
keys = append(keys, key)
}
}
// Remove all entries in the cache that belong to this directory so we can start fresh
for _, key := range keys {
dc.cache.Remove(key)
}
// If the path is a directory, start a walk
err := Walk(fullPath, config.FollowSymlinks, dc.walkRecursiveFunc)
if err != nil {
log.Errorf("CRAWLER - crawl for %s failed: %s", fullPath, err)
}
// After crawling, remove any keys that are still in the list (these are items that were not found on the filesystem)
//dc.CleanupDeletedFiles(path)
} else {
// If the path is a file, add it to the cache directly
dc.AddCacheItem(relPath, info)
}
return nil
}
// CrawlNoRecursion this function crawls a file or directory and does not recurse into any subdirectories. Also returns the result of the crawl.
func (dc *DirectoryCrawler) CrawlNoRecursion(fullPath string) (*CacheItem.Item, error) {
info, err := os.Lstat(fullPath)
if os.IsNotExist(err) {
// If the path doesn't exist, just silently exit
return nil, err
}
if err != nil {
log.Errorf("CRAWLER - CrawlNoRecursion() - os.Lstat() failed - %s", err)
return nil, err
}
//if !config.FollowSymlinks && info.Mode()&os.ModeSymlink > 0 {
// msg := fmt.Sprintf("CRAWL - tried to crawl a symlink (not allowed in config): %s", fullPath)
// log.Warnf(msg)
// return nil, errors.New(msg)
//}
var item *CacheItem.Item
relPath := file.StripRootDir(fullPath)
dc.cache.Remove(relPath)
if info.IsDir() {
// Get a list of all keys in the cache that belong to this directory
keys := make([]string, 0)
for _, key := range dc.cache.Keys() {
if isSubpath(fullPath, key) {
keys = append(keys, key)
}
}
// Remove all entries in the cache that belong to this directory so we can start fresh
for _, key := range keys {
dc.cache.Remove(key)
}
err := filepath.WalkDir(fullPath, dc.walkNonRecursiveFunc)
if err != nil {
log.Errorf("CRAWLER - non-recursive crawl for %s failed: %s", fullPath, err)
return nil, err
}
item, _ = dc.cache.Get(relPath)
} else {
item = CacheItem.NewItem(fullPath, info)
dc.AddCacheItem(fullPath, info)
}
return item, nil
}
func removeOldDir(children []string, strippedDir string) ([]string, bool) {
newChildren := make([]string, 0, len(children))
foundOldDir := false
for _, child := range children {
if child != strippedDir {
newChildren = append(newChildren, child)
} else {
foundOldDir = true
}
}
return newChildren, foundOldDir
}

56
src/cache/DirectoryCrawler/process.go vendored Normal file
View File

@ -0,0 +1,56 @@
package DirectoryCrawler
import (
"crazyfs/CacheItem"
"crazyfs/config"
"crazyfs/file"
"os"
"path/filepath"
)
func (dc *DirectoryCrawler) processPath(fullPath string, info os.FileInfo) error {
relPath := file.StripRootDir(fullPath)
dc.visited.Store(relPath, true)
if info.Mode().IsDir() {
dirItem := CacheItem.NewItem(fullPath, info)
children, err := os.ReadDir(fullPath)
if err != nil {
log.Errorf("CRAWLER - processPath() failed to read directory %s: %s", fullPath, err)
}
for _, entry := range children {
subpath := filepath.Clean(filepath.Join(fullPath, entry.Name()))
dirItem.Children = append(dirItem.Children, file.StripRootDir(subpath))
}
// Add the directory to the cache after all of its children have been processed
dc.cache.Add(relPath, dirItem)
// If the directory is not the root directory, update the parent directory's Children field
// This block of code ensures that the parent directory's Children field is always up-to-date with
// the current state of its subdirectories. It removes any old versions of the current directory
// from the parent's Children field and adds the new version.
if fullPath != config.RootDir {
parentDir := filepath.Dir(fullPath)
strippedParentDir := file.StripRootDir(parentDir)
parentItem, found := dc.cache.Get(strippedParentDir)
if found {
// Remove the old version of the directory from the parent's Children field
newChildren, foundOldDir := removeOldDir(parentItem.Children, relPath)
// Add the new version of the directory to the parent's Children field only if it wasn't found
if !foundOldDir {
parentItem.Children = append(newChildren, relPath)
}
// Update the parent directory in the cache
dc.cache.Add(strippedParentDir, parentItem)
}
}
} else {
// Path is a file
dc.AddCacheItem(fullPath, info)
}
return nil
}

12
src/cache/DirectoryCrawler/vars.go vendored Normal file
View File

@ -0,0 +1,12 @@
package DirectoryCrawler
import (
"crazyfs/logging"
"github.com/sirupsen/logrus"
)
var log *logrus.Logger
func init() {
log = logging.GetLogger()
}

46
src/cache/Worker.go vendored
View File

@ -1,46 +0,0 @@
package cache
import (
"crazyfs/logging"
"os"
)
type Worker struct {
id int
ch chan string
active bool
}
func newWorker(id int) *Worker {
return &Worker{
id: id,
ch: make(chan string, WorkerBufferSize),
active: false,
}
}
func (w *Worker) start(dc *DirectoryCrawler) {
w.active = true
go func() {
for path := range w.ch {
info, err := os.Stat(path)
if err != nil {
logger := logging.GetLogger()
logger.Errorf("WORKER START - os.Stat() - %s", err)
continue
}
dc.cache.Add(StripRootDir(path, RootDir), NewItem(path, info))
}
w.active = false
// Release the token back to the semaphore when the worker is done
<-WorkerSemaphore
}()
}
func (w *Worker) add(path string) {
w.ch <- path
}
func (w *Worker) stop() {
close(w.ch)
}

View File

@ -1,44 +0,0 @@
package cache
import "sync"
var WorkerSemaphore chan struct{}
type WorkerPool struct {
pool chan *Worker
wg sync.WaitGroup
}
func NewWorkerPool() *WorkerPool {
return &WorkerPool{
pool: make(chan *Worker, cap(WorkerSemaphore)), // use the capacity of the semaphore as the size of the pool
}
}
func (p *WorkerPool) Get() *Worker {
select {
case w := <-p.pool:
return w
default:
// Acquire a token from the semaphore
WorkerSemaphore <- struct{}{}
return newWorker(len(p.pool))
}
}
func (p *WorkerPool) Put(w *Worker) {
select {
case p.pool <- w:
default:
// If the pool is full, discard the worker and release the token back to the semaphore
<-WorkerSemaphore
}
}
func (p *WorkerPool) Wait() {
p.wg.Wait()
}
func (p *WorkerPool) Add(delta int) {
p.wg.Add(delta)
}

144
src/cache/crawler.go vendored
View File

@ -1,123 +1,67 @@
package cache package cache
import ( import (
"crazyfs/CacheItem"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data"
"crazyfs/logging" "crazyfs/logging"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"os" "github.com/sirupsen/logrus"
"sync" "sync"
"time" "time"
) )
var itemPool = &sync.Pool{ var log *logrus.Logger
New: func() interface{} {
return &data.Item{} func init() {
}, log = logging.GetLogger()
} }
func StartCrawler(basePath string, sharedCache *lru.Cache[string, *data.Item], cfg *config.Config) error { func StartCrawler(sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config) error {
log = logging.GetLogger()
var wg sync.WaitGroup var wg sync.WaitGroup
crawlerChan := make(chan struct{}, cfg.DirectoryCrawlers) crawlerChan := make(chan struct{}, cfg.DirectoryCrawlers)
// TODO: a crawl may take some time to complete so we need to adjust the wait time based on the duration it took go startCrawl(cfg, sharedCache, &wg, crawlerChan)
go func() {
ticker := time.NewTicker(time.Second * time.Duration(cfg.CrawlModeCrawlInterval))
defer ticker.Stop()
// delay before first crawl
time.Sleep(time.Second * time.Duration(cfg.CrawlModeCrawlInterval))
for {
select {
case <-ticker.C:
crawlerChan <- struct{}{} // block if there are already cfg.DirectoryCrawlers crawlers
wg.Add(1)
go func() {
defer wg.Done()
pool := NewWorkerPool()
crawler := NewDirectoryCrawler(sharedCache, pool)
log.Infoln("CRAWLER - Starting a crawl...")
start := time.Now()
err := crawler.Crawl(basePath, true)
duration := time.Since(start).Round(time.Second)
if err != nil {
log.Warnf("CRAWLER - Crawl failed: %s", err)
} else {
log.Infof("CRAWLER - Crawl completed in %s", duration)
keys := sharedCache.Keys()
log.Debugf("%d/%d items in the cache.", cfg.CacheSize, len(keys))
}
<-crawlerChan // release
}()
}
}
}()
ticker := time.NewTicker(60 * time.Second) ticker := time.NewTicker(60 * time.Second)
go func(c *lru.Cache[string, *data.Item]) { go logCacheStatus("CACHE STATUS", ticker, sharedCache, cfg, log.Debugf)
for range ticker.C {
keys := c.Keys()
log.Debugf("%d/%d items in the cache.", len(keys), cfg.CacheSize)
//fmt.Println(keys) // for debug when things are really messed up
}
}(sharedCache)
return nil return nil
} }
func NewItem(path string, info os.FileInfo) *data.Item { func startCrawl(cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item], wg *sync.WaitGroup, crawlerChan chan struct{}) {
if PrintNew { ticker := time.NewTicker(time.Duration(cfg.CrawlModeCrawlInterval) * time.Second)
log = logging.GetLogger() defer ticker.Stop()
log.Debugf("CACHE - new: %s", path)
time.Sleep(time.Duration(cfg.CrawlModeCrawlInterval) * time.Second)
for range ticker.C {
crawlerChan <- struct{}{}
wg.Add(1)
go func() {
defer wg.Done()
dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
log.Infoln("CRAWLER - Starting a crawl...")
start := time.Now()
err := dc.Crawl(cfg.RootDir, true)
duration := time.Since(start).Round(time.Second)
if err != nil {
log.Warnf("CRAWLER - Crawl failed: %s", err)
} else {
log.Infof("CRAWLER - Crawl completed in %s", duration)
log.Debugf("%d/%d items in the cache.", cfg.CacheSize, len(sharedCache.Keys()))
}
<-crawlerChan
}()
}
}
func logCacheStatus(msg string, ticker *time.Ticker, sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config, logFn func(format string, args ...interface{})) {
defer ticker.Stop()
for range ticker.C {
activeWorkers := int(DirectoryCrawler.ActiveWorkers)
busyWorkers := int(DirectoryCrawler.ActiveWalks)
logFn("%s - %d/%d items in the cache. Active workers: %d Active crawls: %d", msg, len(sharedCache.Keys()), cfg.CacheSize, activeWorkers, busyWorkers)
//fmt.Println(sharedCache.Keys())
} }
// Start processing the MIME type right away.
// It will run in the background while we set up the Item object.
ch := make(chan [2]string)
go AnalyzeFileMime(path, info, CrawlerParseMIME, ch)
item := itemPool.Get().(*data.Item)
// Reset fields
item.Path = ""
item.Name = ""
item.Size = 0
item.Extension = nil
item.Modified = ""
item.Mode = 0
item.IsDir = false
item.IsSymlink = false
item.Cached = 0
item.Children = item.Children[:0]
item.Type = nil
// Set fields
item.Path = StripRootDir(path, RootDir)
item.Name = info.Name()
item.Size = info.Size()
item.Modified = info.ModTime().UTC().Format(time.RFC3339Nano)
item.Mode = uint32(info.Mode().Perm())
item.IsDir = info.IsDir()
item.IsSymlink = info.Mode()&os.ModeSymlink != 0
item.Cached = time.Now().UnixNano() / int64(time.Millisecond)
// Get the MIME data from the background thread
mimeResult := <-ch // This will block until the goroutine finishes
ext, mimeType := mimeResult[0], mimeResult[1]
// Create pointers for mimeType and ext to allow empty JSON strings
var mimeTypePtr, extPtr *string
if mimeType != "" {
mimeTypePtr = &mimeType
}
if ext != "" {
extPtr = &ext
}
item.Extension = extPtr
item.Type = mimeTypePtr
return item
} }

69
src/cache/file.go vendored
View File

@ -1,69 +0,0 @@
package cache
import (
"github.com/gabriel-vasile/mimetype"
"mime"
"os"
"path/filepath"
"strings"
)
func StripRootDir(path, RootDir string) string {
if path == "/" || path == RootDir {
// Avoid erasing our path
return "/"
} else {
return strings.TrimSuffix(strings.TrimPrefix(path, RootDir), "/")
}
}
func GetFileMime(path string, analyze bool) (bool, string, string, error) {
var err error
info, err := os.Stat(path)
if err != nil {
// File does not exist
return false, "", "", err
}
ch := make(chan [2]string)
go AnalyzeFileMime(path, info, analyze, ch)
// Get the MIME data from the background thread
mimeResult := <-ch // This will block until the goroutine finishes
ext, mimeType := mimeResult[0], mimeResult[1]
return true, mimeType, ext, nil
}
func detectMIME(path string, info os.FileInfo) string {
if info.Mode()&os.ModeType == 0 {
mimeObj, err := mimetype.DetectFile(path)
if err != nil {
log.Warnf("Error detecting MIME type: %v", err)
return ""
} else {
return mimeObj.String()
}
} else {
return ""
}
}
func AnalyzeFileMime(path string, info os.FileInfo, analyze bool, ch chan<- [2]string) {
go func() {
var ext string
var mimeType string
if !info.IsDir() && !(info.Mode()&os.ModeSymlink == os.ModeSymlink) {
if CrawlerParseMIME || analyze {
ext = filepath.Ext(path)
mimeType = detectMIME(path, info)
} else {
mimeType = mime.TypeByExtension(ext)
}
if strings.Contains(mimeType, ";") {
mimeType = strings.Split(mimeType, ";")[0]
}
ch <- [2]string{ext, mimeType}
} else {
ch <- [2]string{"", ""}
}
}()
}

73
src/cache/initial.go vendored
View File

@ -1,12 +1,11 @@
package cache package cache
import ( import (
"crazyfs/CacheItem"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data"
"crazyfs/logging" "crazyfs/logging"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"runtime"
"sync"
"time" "time"
) )
@ -16,62 +15,22 @@ func init() {
InitialCrawlInProgress = false InitialCrawlInProgress = false
} }
func InitialCrawl(sharedCache *lru.Cache[string, *data.Item], cfg *config.Config) { func InitialCrawl(sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config) {
log = logging.GetLogger() log = logging.GetLogger()
log.Infof("INITIAL CRAWL - starting the crawl for %s", config.RootDir)
ticker := time.NewTicker(3 * time.Second)
go logCacheStatus("INITIAL CRAWL", ticker, sharedCache, cfg, log.Infof)
InitialCrawlInProgress = true InitialCrawlInProgress = true
dirChan := make(chan string, 100000) // Buffered channel to hold directories to be crawled dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
var wg sync.WaitGroup //start := time.Now()
cacheFull := make(chan bool, 1) // Channel to signal when cache is full err := dc.Crawl(config.RootDir, true)
// Start Worker goroutines
for i := 0; i < runtime.NumCPU()*6; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case dir, ok := <-dirChan:
if ok {
crawlDir(dir, sharedCache, cacheFull, cfg)
} else {
return
}
case <-cacheFull:
return
}
}
}()
}
// Kick off the crawl
dirChan <- cfg.RootDir
close(dirChan)
// Start a ticker to log the number of items in the cache every 2 seconds
ticker := time.NewTicker(2 * time.Second)
go func() {
for range ticker.C {
log.Debugf("INITIAL CRAWL - cache size: %d/%d", sharedCache.Len(), cfg.CacheSize)
}
}()
// Wait for all goroutines to finish
wg.Wait()
ticker.Stop()
InitialCrawlInProgress = false
}
func crawlDir(dir string, sharedCache *lru.Cache[string, *data.Item], cacheFull chan<- bool, cfg *config.Config) {
pool := NewWorkerPool()
crawler := NewDirectoryCrawler(sharedCache, pool)
err := crawler.Crawl(dir, true)
if err != nil { if err != nil {
log.Fatalf("Crawl failed: %s", err) log.Errorf("LIST - background recursive crawl failed: %s", err)
return
}
// Check if cache is full
if sharedCache.Len() >= cfg.CacheSize {
cacheFull <- true
} }
InitialCrawlInProgress = false
ticker.Stop()
//log.Infof("INITIAL CRAWL - finished the initial crawl in %s", time.Since(start).Round(time.Second))
} }

37
src/cache/recache.go vendored
View File

@ -1,8 +1,10 @@
package cache package cache
import ( import (
"crazyfs/CacheItem"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data" "crazyfs/file"
"crazyfs/logging" "crazyfs/logging"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"os" "os"
@ -16,7 +18,7 @@ func InitRecacheSemaphore(limit int) {
sem = make(chan struct{}, limit) sem = make(chan struct{}, limit)
} }
func CheckAndRecache(path string, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { func CheckAndRecache(path string, cfg *config.Config, sharedCache *lru.Cache[string, *CacheItem.Item]) {
item, found := sharedCache.Get(path) item, found := sharedCache.Get(path)
if found && time.Now().UnixNano()/int64(time.Millisecond)-item.Cached > int64(cfg.CacheTime)*60*1000 { if found && time.Now().UnixNano()/int64(time.Millisecond)-item.Cached > int64(cfg.CacheTime)*60*1000 {
log := logging.GetLogger() log := logging.GetLogger()
@ -24,9 +26,8 @@ func CheckAndRecache(path string, cfg *config.Config, sharedCache *lru.Cache[str
sem <- struct{}{} // acquire a token sem <- struct{}{} // acquire a token
go func() { go func() {
defer func() { <-sem }() // release the token when done defer func() { <-sem }() // release the token when done
pool := NewWorkerPool() dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
crawler := NewDirectoryCrawler(sharedCache, pool) err := dc.Crawl(path, true)
err := crawler.Crawl(path, true)
if err != nil { if err != nil {
log.Errorf("RECACHE ERROR: %s", err.Error()) log.Errorf("RECACHE ERROR: %s", err.Error())
} }
@ -34,26 +35,27 @@ func CheckAndRecache(path string, cfg *config.Config, sharedCache *lru.Cache[str
} }
} }
func Recache(path string, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) { func Recache(path string, sharedCache *lru.Cache[string, *CacheItem.Item]) {
log := logging.GetLogger() log := logging.GetLogger()
log.Debugf("Re-caching: %s", path) log.Debugf("Re-caching: %s", path)
start := time.Now()
sem <- struct{}{} // acquire a token sem <- struct{}{} // acquire a token
go func() { go func() {
defer func() { <-sem }() // release the token when done defer func() { <-sem }() // release the token when done
pool := NewWorkerPool() dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
crawler := NewDirectoryCrawler(sharedCache, pool) err := dc.Crawl(path, true)
err := crawler.Crawl(path, true)
if err != nil { if err != nil {
log.Errorf("RECACHE ERROR: %s", err.Error()) log.Errorf("RECACHE ERROR: %s", err.Error())
} }
// Get the parent directory from the cache // Get the parent directory from the cache
parentDir := filepath.Dir(path) parentDir := filepath.Dir(path)
parentItem, found := sharedCache.Get(parentDir) parentDirRel := file.StripRootDir(parentDir)
parentItem, found := sharedCache.Get(parentDirRel)
if found { if found {
// Remove the old sub-directory from the parent directory's Children field // Remove the old sub-directory from the parent directory's Children field
for i, child := range parentItem.Children { for i, child := range parentItem.Children {
if child.Path == path { if child == path {
parentItem.Children = append(parentItem.Children[:i], parentItem.Children[i+1:]...) parentItem.Children = append(parentItem.Children[:i], parentItem.Children[i+1:]...)
break break
} }
@ -64,16 +66,16 @@ func Recache(path string, cfg *config.Config, sharedCache *lru.Cache[string, *da
if err != nil { if err != nil {
log.Errorf("RECACHE ERROR: %s", err.Error()) log.Errorf("RECACHE ERROR: %s", err.Error())
} else { } else {
newItem := NewItem(path, info) newItem := CacheItem.NewItem(path, info)
// Create a new slice that contains all items from the Children field except the old directory // Create a new slice that contains all items from the Children field except the old directory
newChildren := make([]*data.Item, 0, len(parentItem.Children)) newChildren := make([]string, 0, len(parentItem.Children))
for _, child := range parentItem.Children { for _, child := range parentItem.Children {
if child.Path != newItem.Path { if child != newItem.Path {
newChildren = append(newChildren, child) newChildren = append(newChildren, child)
} }
} }
// Append the new directory to the newChildren slice // Append the new directory to the newChildren slice
newChildren = append(newChildren, newItem) newChildren = append(newChildren, newItem.Path)
// Assign the newChildren slice to the Children field // Assign the newChildren slice to the Children field
parentItem.Children = newChildren parentItem.Children = newChildren
// Update the parent directory in the cache // Update the parent directory in the cache
@ -81,10 +83,13 @@ func Recache(path string, cfg *config.Config, sharedCache *lru.Cache[string, *da
} }
} else { } else {
// If the parent directory isn't in the cache, crawl it // If the parent directory isn't in the cache, crawl it
err := crawler.Crawl(parentDir, true) log.Infof("RECACHE - crawling parent directory since it isn't in the cache yet: %s", parentDir)
err := dc.Crawl(parentDir, true)
if err != nil { if err != nil {
log.Errorf("RECACHE ERROR: %s", err.Error()) log.Errorf("RECACHE ERROR: %s", err.Error())
} }
} }
duration := time.Since(start).Round(time.Second)
log.Infof("RECACHE - completed in %s - %s", duration, path)
}() }()
} }

98
src/cache/search.go vendored Normal file
View File

@ -0,0 +1,98 @@
package cache
import (
"bytes"
"crazyfs/CacheItem"
"crazyfs/config"
"encoding/gob"
lru "github.com/hashicorp/golang-lru/v2"
"strings"
)
func SearchLRU(queryString string, excludeElements []string, limitResults int, sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config) []*CacheItem.Item {
results := make([]*CacheItem.Item, 0)
const maxGoroutines = 100
// Create a buffered channel as a semaphore
sem := make(chan struct{}, maxGoroutines)
resultsChan := make(chan *CacheItem.Item, len(sharedCache.Keys()))
for _, key := range sharedCache.Keys() {
searchKey(key, queryString, excludeElements, sem, resultsChan, sharedCache, cfg)
}
// Wait for all goroutines to finish
for i := 0; i < maxGoroutines; i++ {
sem <- struct{}{}
}
for range sharedCache.Keys() {
item := <-resultsChan
if item != nil {
results = append(results, item)
if (limitResults > 0 && len(results) == limitResults) || len(results) >= cfg.ApiSearchMaxResults {
break
}
}
}
return results
}
func searchKey(key string, queryString string, excludeElements []string, sem chan struct{}, resultsChan chan *CacheItem.Item, sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config) {
// Acquire a token
sem <- struct{}{}
go func() {
// Release the token at the end
defer func() { <-sem }()
cacheItem, found := sharedCache.Get(key)
if !found {
resultsChan <- nil
return
}
lowerKey := strings.ToLower(key)
if strings.Contains(lowerKey, queryString) {
// check if key contains any of the exclude elements
shouldExclude := false
for _, exclude := range excludeElements {
if strings.Contains(lowerKey, exclude) {
shouldExclude = true
break
}
}
if shouldExclude {
resultsChan <- nil
return
}
// Create a deep copy of the CacheItem
var buf bytes.Buffer
enc := gob.NewEncoder(&buf)
dec := gob.NewDecoder(&buf)
err := enc.Encode(cacheItem)
if err != nil {
log.Printf("Error encoding CacheItem: %v", err)
resultsChan <- nil
return
}
var item CacheItem.Item
err = dec.Decode(&item)
if err != nil {
log.Printf("Error decoding CacheItem: %v", err)
resultsChan <- nil
return
}
if !cfg.ApiSearchShowChildren {
item.Children = nil // erase the children dict
}
resultsChan <- &item
} else {
//resultsChan <- nil
}
}()
}

17
src/cache/watcher.go vendored
View File

@ -1,21 +1,17 @@
package cache package cache
import ( import (
"crazyfs/CacheItem"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data"
"crazyfs/logging"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"github.com/radovskyb/watcher" "github.com/radovskyb/watcher"
"github.com/sirupsen/logrus"
"strings" "strings"
"sync" "sync"
"time" "time"
) )
var log *logrus.Logger func StartWatcher(basePath string, sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config) (*watcher.Watcher, error) {
func StartWatcher(basePath string, sharedCache *lru.Cache[string, *data.Item], cfg *config.Config) (*watcher.Watcher, error) {
log = logging.GetLogger()
w := watcher.New() w := watcher.New()
var wg sync.WaitGroup var wg sync.WaitGroup
crawlerChan := make(chan struct{}, cfg.DirectoryCrawlers) // limit to cfg.DirectoryCrawlers concurrent crawlers crawlerChan := make(chan struct{}, cfg.DirectoryCrawlers) // limit to cfg.DirectoryCrawlers concurrent crawlers
@ -65,9 +61,8 @@ func StartWatcher(basePath string, sharedCache *lru.Cache[string, *data.Item], c
wg.Add(1) wg.Add(1)
go func() { go func() {
defer wg.Done() defer wg.Done()
pool := NewWorkerPool() dc := DirectoryCrawler.NewDirectoryCrawler(sharedCache)
crawler := NewDirectoryCrawler(sharedCache, pool) err := dc.Crawl(event.Path, true)
err := crawler.Crawl(event.Path, true)
if err != nil { if err != nil {
log.Warnf("WATCHER - Crawl failed: %s", err) log.Warnf("WATCHER - Crawl failed: %s", err)
} }
@ -95,7 +90,7 @@ func StartWatcher(basePath string, sharedCache *lru.Cache[string, *data.Item], c
// Print the filenames of the cache entries every 5 seconds // Print the filenames of the cache entries every 5 seconds
ticker := time.NewTicker(60 * time.Second) ticker := time.NewTicker(60 * time.Second)
go func(c *lru.Cache[string, *data.Item]) { go func(c *lru.Cache[string, *CacheItem.Item]) {
for range ticker.C { for range ticker.C {
keys := c.Keys() keys := c.Keys()
log.Debugf("%d items in the cache.", len(keys)) log.Debugf("%d items in the cache.", len(keys))

View File

@ -7,29 +7,41 @@ import (
) )
type Config struct { type Config struct {
RootDir string RootDir string
HTTPPort string HTTPPort string
WatchMode string WatchMode string
CrawlModeCrawlInterval int CrawlModeCrawlInterval int
DirectoryCrawlers int DirectoryCrawlers int
WatchInterval int CrawlWorkers int
CacheSize int WatchInterval int
CacheTime int CacheSize int
CachePrintNew bool CacheTime int
CachePrintChanges bool CachePrintNew bool
InitialCrawl bool CachePrintChanges bool
CacheRecacheCrawlerLimit int InitialCrawl bool
CrawlerParseMIME bool CacheRecacheCrawlerLimit int
HttpAPIListCacheControl int CrawlerParseMIME bool
HttpAPIDlCacheControl int HttpAPIListCacheControl int
HttpAllowDirMimeParse bool HttpAPIDlCacheControl int
HttpAdminKey string HttpAllowDirMimeParse bool
HttpAllowDuringInitialCrawl bool HttpAdminKey string
RestrictedDownloadPaths []string HttpAllowDuringInitialCrawl bool
ApiSearchMaxResults int RestrictedDownloadPaths []string
ApiSearchShowChildren bool ApiSearchMaxResults int
CrawlerChannelBufferSize int ApiSearchShowChildren bool
CrawlerMaxWorkers int WorkersJobQueueSize int
ElasticsearchEnable bool
ElasticsearchEndpoint string
ElasticsearchSyncEnable bool
ElasticsearchSyncInterval int
ElasticsearchFullSyncInterval int
ElasticsearchAPIKey string
ElasticsearchIndex string
ElasticsearchSyncThreads int
ElasticsearchExcludePatterns []string
ElasticsearchAllowConcurrentSyncs bool
ElasticsearchFullSyncOnStart bool
ElasticsearchDefaultQueryField string
} }
func LoadConfig(configFile string) (*Config, error) { func LoadConfig(configFile string) (*Config, error) {
@ -39,6 +51,7 @@ func LoadConfig(configFile string) (*Config, error) {
viper.SetDefault("watch_mode", "crawl") viper.SetDefault("watch_mode", "crawl")
viper.SetDefault("crawl_mode_crawl_interval", 3600) viper.SetDefault("crawl_mode_crawl_interval", 3600)
viper.SetDefault("directory_crawlers", 4) viper.SetDefault("directory_crawlers", 4)
viper.SetDefault("crawl_workers", 10)
viper.SetDefault("cache_size", 100000000) viper.SetDefault("cache_size", 100000000)
viper.SetDefault("cache_time", 30) viper.SetDefault("cache_time", 30)
viper.SetDefault("cache_print_new", false) viper.SetDefault("cache_print_new", false)
@ -53,8 +66,20 @@ func LoadConfig(configFile string) (*Config, error) {
viper.SetDefault("api_search_max_results", 1000) viper.SetDefault("api_search_max_results", 1000)
viper.SetDefault("api_search_show_children", false) viper.SetDefault("api_search_show_children", false)
viper.SetDefault("http_allow_during_initial_crawl", false) viper.SetDefault("http_allow_during_initial_crawl", false)
viper.SetDefault("crawler_channel_buffer_size", 1000) viper.SetDefault("crawler_worker_job_queue_size", 0)
viper.SetDefault("crawler_max_workers", 200) viper.SetDefault("elasticsearch_enable", false)
viper.SetDefault("elasticsearch_endpoint", "http://localhost:9200")
viper.SetDefault("elasticsearch_sync_enable", true)
viper.SetDefault("elasticsearch_sync_interval", 1800)
viper.SetDefault("elasticsearch_full_sync_interval", 86400)
viper.SetDefault("elasticsearch_api_key", "")
viper.SetDefault("elasticsearch_index", "crazyfs_search")
viper.SetDefault("elasticsearch_sync_threads", 50)
viper.SetDefault("elasticsearch_exclude_patterns", []string{".git"})
viper.SetDefault("elasticsearch_allow_concurrent_syncs", false)
viper.SetDefault("elasticsearch_full_sync_on_start", false)
viper.SetDefault("elasticsearch_query_fields", []string{"extension", "name", "path", "type", "size", "isDir"})
viper.SetDefault("elasticsearch_default_query_field", "name")
err := viper.ReadInConfig() err := viper.ReadInConfig()
if err != nil { if err != nil {
@ -63,35 +88,60 @@ func LoadConfig(configFile string) (*Config, error) {
restrictedPaths := viper.GetStringSlice("restricted_download_paths") restrictedPaths := viper.GetStringSlice("restricted_download_paths")
for i, path := range restrictedPaths { for i, path := range restrictedPaths {
restrictedPaths[i] = strings.TrimSuffix(path, "/") if restrictedPaths[i] != "/" {
restrictedPaths[i] = strings.TrimSuffix(path, "/")
}
} }
rootDir := strings.TrimSuffix(viper.GetString("root_dir"), "/") rootDir := strings.TrimSuffix(viper.GetString("root_dir"), "/")
if rootDir == "" {
rootDir = "/"
}
workersJobQueueSizeValue := viper.GetInt("crawler_worker_job_queue_size")
var workersJobQueueSize int
if workersJobQueueSizeValue == 0 {
workersJobQueueSize = viper.GetInt("crawl_workers") * 100
} else {
workersJobQueueSize = workersJobQueueSizeValue
}
config := &Config{ config := &Config{
RootDir: rootDir, RootDir: rootDir,
HTTPPort: viper.GetString("http_port"), HTTPPort: viper.GetString("http_port"),
WatchMode: viper.GetString("watch_mode"), WatchMode: viper.GetString("watch_mode"),
CrawlModeCrawlInterval: viper.GetInt("crawl_mode_crawl_interval"), CrawlModeCrawlInterval: viper.GetInt("crawl_mode_crawl_interval"),
WatchInterval: viper.GetInt("watch_interval"), WatchInterval: viper.GetInt("watch_interval"),
DirectoryCrawlers: viper.GetInt("crawl_mode_crawl_interval"), DirectoryCrawlers: viper.GetInt("crawl_mode_crawl_interval"),
CacheSize: viper.GetInt("cache_size"), CrawlWorkers: viper.GetInt("crawl_workers"),
CacheTime: viper.GetInt("cache_time"), CacheSize: viper.GetInt("cache_size"),
CachePrintNew: viper.GetBool("cache_print_new"), CacheTime: viper.GetInt("cache_time"),
CachePrintChanges: viper.GetBool("cache_print_changes"), CachePrintNew: viper.GetBool("cache_print_new"),
InitialCrawl: viper.GetBool("initial_crawl"), CachePrintChanges: viper.GetBool("cache_print_changes"),
CacheRecacheCrawlerLimit: viper.GetInt("cache_recache_crawler_limit"), InitialCrawl: viper.GetBool("initial_crawl"),
CrawlerParseMIME: viper.GetBool("crawler_parse_mime"), CacheRecacheCrawlerLimit: viper.GetInt("cache_recache_crawler_limit"),
HttpAPIListCacheControl: viper.GetInt("http_api_list_cache_control"), CrawlerParseMIME: viper.GetBool("crawler_parse_mime"),
HttpAPIDlCacheControl: viper.GetInt("http_api_download_cache_control"), HttpAPIListCacheControl: viper.GetInt("http_api_list_cache_control"),
HttpAllowDirMimeParse: viper.GetBool("http_allow_dir_mime_parse"), HttpAPIDlCacheControl: viper.GetInt("http_api_download_cache_control"),
HttpAdminKey: viper.GetString("api_admin_key"), HttpAllowDirMimeParse: viper.GetBool("http_allow_dir_mime_parse"),
HttpAllowDuringInitialCrawl: viper.GetBool("http_allow_during_initial_crawl"), HttpAdminKey: viper.GetString("api_admin_key"),
RestrictedDownloadPaths: restrictedPaths, HttpAllowDuringInitialCrawl: viper.GetBool("http_allow_during_initial_crawl"),
ApiSearchMaxResults: viper.GetInt("api_search_max_results"), RestrictedDownloadPaths: restrictedPaths,
ApiSearchShowChildren: viper.GetBool("api_search_show_children"), ApiSearchMaxResults: viper.GetInt("api_search_max_results"),
CrawlerChannelBufferSize: viper.GetInt("crawler_channel_buffer_size"), ApiSearchShowChildren: viper.GetBool("api_search_show_children"),
CrawlerMaxWorkers: viper.GetInt("crawler_worker_pool_size"), WorkersJobQueueSize: workersJobQueueSize,
ElasticsearchEnable: viper.GetBool("elasticsearch_enable"),
ElasticsearchEndpoint: viper.GetString("elasticsearch_endpoint"),
ElasticsearchSyncEnable: viper.GetBool("elasticsearch_sync_enable"),
ElasticsearchSyncInterval: viper.GetInt("elasticsearch_sync_interval"),
ElasticsearchFullSyncInterval: viper.GetInt("elasticsearch_full_sync_interval"),
ElasticsearchAPIKey: viper.GetString("elasticsearch_api_key"),
ElasticsearchIndex: viper.GetString("elasticsearch_index"),
ElasticsearchSyncThreads: viper.GetInt("elasticsearch_sync_threads"),
ElasticsearchExcludePatterns: viper.GetStringSlice("elasticsearch_exclude_patterns"),
ElasticsearchAllowConcurrentSyncs: viper.GetBool("elasticsearch_allow_concurrent_syncs"),
ElasticsearchFullSyncOnStart: viper.GetBool("elasticsearch_full_sync_on_start"),
ElasticsearchDefaultQueryField: viper.GetString("elasticsearch_default_query_field"),
} }
if config.WatchMode != "crawl" && config.WatchMode != "watch" { if config.WatchMode != "crawl" && config.WatchMode != "watch" {
@ -106,8 +156,12 @@ func LoadConfig(configFile string) (*Config, error) {
return nil, errors.New("crawl_mode_crawl_interval must be more than 1") return nil, errors.New("crawl_mode_crawl_interval must be more than 1")
} }
if config.CrawlWorkers < 1 {
return nil, errors.New("crawl_workers must be more than 1")
}
if config.CacheSize < 1 { if config.CacheSize < 1 {
return nil, errors.New("cache_size must be more than 1") return nil, errors.New("crawl_workers must be more than 1")
} }
if config.CacheRecacheCrawlerLimit < 1 { if config.CacheRecacheCrawlerLimit < 1 {
@ -130,8 +184,8 @@ func LoadConfig(configFile string) (*Config, error) {
return nil, errors.New("api_search_max_results must not be less than 1") return nil, errors.New("api_search_max_results must not be less than 1")
} }
if config.CrawlerChannelBufferSize < 1 { if config.ElasticsearchFullSyncInterval < config.ElasticsearchSyncInterval {
return nil, errors.New("crawler_channel_buffer_size must not be less than 1") return nil, errors.New("elasticsearch_full_sync_interval must be greater than elasticsearch_sync_interval")
} }
return config, nil return config, nil

13
src/config/vars.go Normal file
View File

@ -0,0 +1,13 @@
package config
// Config constants
var FollowSymlinks bool
var CachePrintNew bool
var RootDir string
var CrawlerParseMIME bool
var MaxWorkers int
var HttpAllowDuringInitialCrawl bool
var RestrictedDownloadPaths []string
var ElasticsearchEnable bool
var ElasticsearchEndpoint string
var ElasticsearchSyncInterval int

View File

@ -1,17 +1,21 @@
package main package main
import ( import (
"crazyfs/CacheItem"
"crazyfs/api" "crazyfs/api"
"crazyfs/cache" "crazyfs/cache"
"crazyfs/cache/DirectoryCrawler"
"crazyfs/config" "crazyfs/config"
"crazyfs/data" "crazyfs/elastic"
"crazyfs/logging" "crazyfs/logging"
"errors" "errors"
"flag" "flag"
"fmt" "fmt"
"github.com/elastic/go-elasticsearch/v8"
lru "github.com/hashicorp/golang-lru/v2" lru "github.com/hashicorp/golang-lru/v2"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"net/http" "net/http"
_ "net/http/pprof"
"os" "os"
"path/filepath" "path/filepath"
"time" "time"
@ -20,19 +24,20 @@ import (
var log *logrus.Logger var log *logrus.Logger
var cfg *config.Config var cfg *config.Config
var lruSize int
type cliConfig struct { type cliConfig struct {
configFile string configFile string
initialCrawl bool initialCrawl bool
debug bool debug bool
help bool disableElasticSync bool
help bool
} }
// TODO: optional serving of frontend // TODO: optional serving of frontend
// TODO: admin api to clear cache, get number of items in cache, get memory usage // TODO: admin api to clear cache, get number of items in cache, get memory usage
// TODO: health api endpoint that tells us if the server is still starting // TODO: health api endpoint that tells us if the server is still starting
// TODO: set global http headers rather than randomly setting them in routes // TODO: set global http headers rather than randomly setting them in routes
// TODO: admin api endpoint to start a full refresh of elasticsearch
// TODO: admin api endpoint to get status and progress of the full refresh of elasticsearch
func main() { func main() {
cliArgs := parseArgs() cliArgs := parseArgs()
@ -73,27 +78,37 @@ func main() {
log.Fatalf("Config file does not exist: %s", cliArgs.configFile) log.Fatalf("Config file does not exist: %s", cliArgs.configFile)
} }
cache.FollowSymlinks = false
var err error var err error
cfg, err = config.LoadConfig(cliArgs.configFile) cfg, err = config.LoadConfig(cliArgs.configFile)
if err != nil { if err != nil {
log.Fatalf("Failed to load config file: %s", err) log.Fatalf("Failed to load config file: %s", err)
} }
// Set global constants sharedCache, err := lru.New[string, *CacheItem.Item](cfg.CacheSize)
cache.WorkerBufferSize = cfg.CrawlerChannelBufferSize
cache.PrintNew = cfg.CachePrintNew
cache.RootDir = cfg.RootDir
cache.CrawlerParseMIME = cfg.CrawlerParseMIME
//cache.MaxWorkers = cfg.CrawlWorkers
cache.WorkerSemaphore = make(chan struct{}, cfg.CrawlerMaxWorkers)
sharedCache, err := lru.New[string, *data.Item](cfg.CacheSize)
if err != nil { if err != nil {
log.Fatal(err) log.Fatal(err)
} }
// Set config variables
// TODO: just pass the entire cfg object
config.FollowSymlinks = false
config.CachePrintNew = cfg.CachePrintNew
config.RootDir = cfg.RootDir
config.CrawlerParseMIME = cfg.CrawlerParseMIME
config.MaxWorkers = cfg.CrawlWorkers
config.HttpAllowDuringInitialCrawl = cfg.HttpAllowDuringInitialCrawl
DirectoryCrawler.JobQueueSize = cfg.WorkersJobQueueSize
config.RestrictedDownloadPaths = cfg.RestrictedDownloadPaths
config.ElasticsearchEnable = cfg.ElasticsearchEnable
config.ElasticsearchEndpoint = cfg.ElasticsearchEndpoint
config.ElasticsearchSyncInterval = cfg.ElasticsearchSyncInterval
log.Infof("Elasticsearch enabled: %t", cfg.ElasticsearchEnable)
// Init global variables
//DirectoryCrawler.CrawlWorkerPool = DirectoryCrawler.NewWorkerPool(config.MaxWorkers)
DirectoryCrawler.WorkerPool = make(chan struct{}, config.MaxWorkers)
cache.InitRecacheSemaphore(cfg.CacheRecacheCrawlerLimit) cache.InitRecacheSemaphore(cfg.CacheRecacheCrawlerLimit)
// Start the webserver before doing the long crawl // Start the webserver before doing the long crawl
@ -107,15 +122,13 @@ func main() {
}() }()
log.Infof("Server started on port %s", cfg.HTTPPort) log.Infof("Server started on port %s", cfg.HTTPPort)
lruSize = cfg.CacheSize
if cliArgs.initialCrawl || cfg.InitialCrawl { if cliArgs.initialCrawl || cfg.InitialCrawl {
log.Infoln("Preforming initial crawl...") log.Infoln("Preforming initial crawl...")
start := time.Now() start := time.Now()
cache.InitialCrawl(sharedCache, cfg) cache.InitialCrawl(sharedCache, cfg)
duration := time.Since(start).Round(time.Second) duration := time.Since(start).Round(time.Second)
keys := sharedCache.Keys() keys := sharedCache.Keys()
log.Infof("Initial crawl completed in %s. %d directories and files added to the cache.", duration, len(keys)) log.Infof("Initial crawl completed in %s. %d items added to the cache.", duration, len(keys))
} }
if cfg.WatchMode == "watch" { if cfg.WatchMode == "watch" {
@ -128,13 +141,41 @@ func main() {
defer watcher.Close() defer watcher.Close()
} else if cfg.WatchMode == "crawl" { } else if cfg.WatchMode == "crawl" {
//log.Debugln("Starting the crawler") //log.Debugln("Starting the crawler")
err := cache.StartCrawler(cfg.RootDir, sharedCache, cfg) err := cache.StartCrawler(sharedCache, cfg)
if err != nil { if err != nil {
log.Fatalf("Failed to start timed crawler process: %s", err) log.Fatalf("Failed to start timed crawler process: %s", err)
} }
log.Infoln("Started the timed crawler process") log.Infoln("Started the timed crawler process")
} }
if cfg.ElasticsearchEnable {
// If we fail to establish a connection to Elastic, don't kill the entire server.
// Instead, just disable Elastic.
esCfg := elasticsearch.Config{
Addresses: []string{
cfg.ElasticsearchEndpoint,
},
APIKey: cfg.ElasticsearchAPIKey,
}
es, err := elasticsearch.NewClient(esCfg)
if err != nil {
log.Errorf("Error creating the Elasticsearch client: %s", err)
elastic.LogElasticQuit()
cfg.ElasticsearchEnable = false
} else {
elastic.ElasticClient = es
if cfg.ElasticsearchSyncEnable && !cliArgs.disableElasticSync {
go elastic.ElasticsearchThread(sharedCache, cfg)
log.Info("Started the background Elasticsearch sync thread.")
} else {
log.Info("The background Elasticsearch sync thread is disabled.")
}
}
}
select {} select {}
} }
@ -145,6 +186,7 @@ func parseArgs() cliConfig {
flag.BoolVar(&cliArgs.initialCrawl, "i", false, "Do an initial crawl to fill the cache") flag.BoolVar(&cliArgs.initialCrawl, "i", false, "Do an initial crawl to fill the cache")
flag.BoolVar(&cliArgs.debug, "d", false, "Enable debug mode") flag.BoolVar(&cliArgs.debug, "d", false, "Enable debug mode")
flag.BoolVar(&cliArgs.debug, "debug", false, "Enable debug mode") flag.BoolVar(&cliArgs.debug, "debug", false, "Enable debug mode")
flag.BoolVar(&cliArgs.disableElasticSync, "disable-elastic-sync", false, "Disable the Elasticsearch background sync thread")
flag.Parse() flag.Parse()
return cliArgs return cliArgs
} }

View File

@ -1,16 +0,0 @@
package data
type Item struct {
Path string `json:"path"`
Name string `json:"name"`
Size int64 `json:"size"`
Extension *string `json:"extension"`
Modified string `json:"modified"`
Mode uint32 `json:"mode"`
IsDir bool `json:"isDir"`
IsSymlink bool `json:"isSymlink"`
Type *string `json:"type"`
Children []*Item `json:"children"`
Content string `json:"content,omitempty"`
Cached int64 `json:"cached"`
}

View File

@ -0,0 +1,122 @@
package elastic
import (
"crazyfs/CacheItem"
"crazyfs/config"
lru "github.com/hashicorp/golang-lru/v2"
"slices"
"sync"
"time"
)
func ElasticsearchThread(sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config) {
createCrazyfsIndex(cfg)
// Test connection to Elastic.
esContents, err := getPathsFromIndex(cfg)
if err != nil {
logElasticConnError(err)
return
}
esSize := len(esContents)
log.Infof(`ELASTIC - index "%s" contains %d items.`, cfg.ElasticsearchIndex, esSize)
var wg sync.WaitGroup
sem := make(chan bool, cfg.ElasticsearchSyncThreads)
// Run a partial sync at startup, unless configured to run a full one.
syncElasticsearch(sharedCache, cfg, &wg, sem, cfg.ElasticsearchFullSyncOnStart)
ticker := time.NewTicker(time.Duration(cfg.ElasticsearchSyncInterval) * time.Second)
fullSyncTicker := time.NewTicker(time.Duration(cfg.ElasticsearchFullSyncInterval) * time.Second)
var mutex sync.Mutex
for {
select {
case <-ticker.C:
if !cfg.ElasticsearchAllowConcurrentSyncs {
mutex.Lock()
}
syncElasticsearch(sharedCache, cfg, &wg, sem, false)
if !cfg.ElasticsearchAllowConcurrentSyncs {
mutex.Unlock()
}
case <-fullSyncTicker.C:
if !cfg.ElasticsearchAllowConcurrentSyncs {
mutex.Lock()
}
syncElasticsearch(sharedCache, cfg, &wg, sem, true)
if !cfg.ElasticsearchAllowConcurrentSyncs {
mutex.Unlock()
}
}
}
}
func syncElasticsearch(sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config, wg *sync.WaitGroup, sem chan bool, fullSync bool) {
var syncType string
var esContents []string
if fullSync {
ElasticRefreshSyncRunning = true
syncType = "full refresh"
} else {
ElasticNewSyncRunning = true
syncType = "refresh"
var err error
esContents, err = getPathsFromIndex(cfg)
if err != nil {
log.Errorf("ELASTIC - Failed to read the index: %s", err)
return
}
}
log.Infof("ELASTIC - starting a %s sync.", syncType)
start := time.Now()
for _, key := range sharedCache.Keys() {
wg.Add(1)
go func(key string) {
defer wg.Done()
sem <- true
cacheItem, found := sharedCache.Get(key)
if !found {
log.Fatalf(`ELASTICSEARCH - Could not fetch item "%s" from the LRU cache!`, key)
} else {
if !shouldExclude(key, cfg.ElasticsearchExcludePatterns) {
if fullSync {
addToElasticsearch(cacheItem, cfg)
} else if !slices.Contains(esContents, key) {
addToElasticsearch(cacheItem, cfg)
}
} else {
deleteFromElasticsearch(key, cfg) // clean up
//log.Debugf(`ELASTIC - skipping adding "%s"`, key)
}
}
<-sem
}(key)
}
wg.Wait()
log.Debugln("ELASTIC - Checking for removed items...")
removeStaleItemsFromElasticsearch(sharedCache, cfg)
if fullSync {
ElasticRefreshSyncRunning = false
} else {
ElasticNewSyncRunning = false
}
duration := time.Since(start)
log.Infof("ELASTIC - %s sync finished in %s", syncType, duration)
}
func logElasticConnError(err error) {
log.Errorf("ELASTIC - Failed to read the index: %s", err)
LogElasticQuit()
}
func LogElasticQuit() {
log.Errorln("ELASTIC - background thread exiting, Elastic indexing and search will not be available.")
}

52
src/elastic/add.go Normal file
View File

@ -0,0 +1,52 @@
package elastic
import (
"bytes"
"context"
"crazyfs/CacheItem"
"crazyfs/config"
"encoding/json"
"github.com/elastic/go-elasticsearch/v8/esapi"
)
func addToElasticsearch(item *CacheItem.Item, cfg *config.Config) {
log.Debugf(`ELASTIC - Adding: "%s"`, item.Path)
prepareCacheItem(item)
data, err := json.Marshal(item)
if err != nil {
log.Printf("Error marshaling item: %s", err)
return
}
req := esapi.IndexRequest{
Index: cfg.ElasticsearchIndex,
DocumentID: encodeToBase64(item.Path),
Body: bytes.NewReader(data),
Refresh: "true",
}
res, err := req.Do(context.Background(), ElasticClient)
if err != nil {
log.Errorf("ELASTIC - Error getting response: %s", err)
return
}
defer res.Body.Close()
if res.IsError() {
var e map[string]interface{}
if err := json.NewDecoder(res.Body).Decode(&e); err != nil {
log.Printf("Error parsing the response body: %s", err)
}
log.Errorf(`ELASTIC - Error indexing document "%s" - Status code: %d - %s`, item.Path, res.StatusCode, e)
}
}
// prepareCacheItem is used to get an item ready to insert into Elastic.
func prepareCacheItem(item *CacheItem.Item) {
// We don't care about the children and this field's length may cause issues.
item.Children = nil
// Length of this one also may cause issues.
item.Content = ""
// Don't need to return anything since `item` is a pointer.
}

74
src/elastic/delete.go Normal file
View File

@ -0,0 +1,74 @@
package elastic
import (
"context"
"crazyfs/CacheItem"
"crazyfs/config"
"encoding/json"
"github.com/elastic/go-elasticsearch/v8/esapi"
lru "github.com/hashicorp/golang-lru/v2"
"sync"
)
func removeStaleItemsFromElasticsearch(sharedCache *lru.Cache[string, *CacheItem.Item], cfg *config.Config) {
// Retrieve all keys from Elasticsearch
keys, err := getPathsFromIndex(cfg)
if err != nil {
log.Errorf("ELASTIC - Error retrieving keys from Elasticsearch: %s", err)
return
}
// Create a buffered channel as a semaphore
sem := make(chan struct{}, cfg.ElasticsearchSyncThreads)
// Create a wait group to wait for all goroutines to finish
var wg sync.WaitGroup
// For each key in Elasticsearch, check if it exists in the LRU cache
for _, key := range keys {
// Increment the wait group counter
wg.Add(1)
// Acquire a semaphore
sem <- struct{}{}
go func(key string) {
// Ensure the semaphore is released and the wait group counter is decremented when the goroutine finishes
defer func() {
<-sem
wg.Done()
}()
if _, ok := sharedCache.Get(key); !ok {
// If a key does not exist in the LRU cache, delete it from Elasticsearch
deleteFromElasticsearch(key, cfg)
log.Debugf(`ELASTIC - Removed key "%s"`, key)
}
}(key)
}
// Wait for all goroutines to finish
wg.Wait()
}
func deleteFromElasticsearch(key string, cfg *config.Config) {
req := esapi.DeleteRequest{
Index: cfg.ElasticsearchIndex,
DocumentID: encodeToBase64(key),
}
res, err := req.Do(context.Background(), ElasticClient)
if err != nil {
return
}
defer res.Body.Close()
// If we tried to delete a key that doesn't exist in Elastic, it will return an error that we will ignore.
if res.IsError() && res.StatusCode != 404 {
var e map[string]interface{}
if err := json.NewDecoder(res.Body).Decode(&e); err != nil {
log.Printf("Error parsing the response body: %s", err)
}
log.Errorf(`ELASTIC - Error deleting document "%s" - Status code: %d - %s`, key, res.StatusCode, e)
}
}

19
src/elastic/elastic.go Normal file
View File

@ -0,0 +1,19 @@
package elastic
import (
"crazyfs/logging"
"github.com/elastic/go-elasticsearch/v8"
"github.com/sirupsen/logrus"
)
var log *logrus.Logger
var ElasticClient *elasticsearch.Client
var ElasticNewSyncRunning bool
var ElasticRefreshSyncRunning bool
func init() {
log = logging.GetLogger()
ElasticNewSyncRunning = false
ElasticRefreshSyncRunning = false
}

27
src/elastic/helpers.go Normal file
View File

@ -0,0 +1,27 @@
package elastic
import (
"encoding/base64"
"strings"
)
func shouldExclude(path string, exclusions []string) bool {
parts := strings.Split(path, "/")
// Check each part of the path to see if it's in the exclusions list.
// This will exclude all children as well.
for _, part := range parts {
for _, exclusion := range exclusions {
if part == exclusion {
return true
}
}
}
return false
}
func encodeToBase64(s string) string {
// Used to encode key names to base64 since file paths aren't very Elastic-friendly.
return base64.RawURLEncoding.EncodeToString([]byte(s))
}

31
src/elastic/index.go Normal file
View File

@ -0,0 +1,31 @@
package elastic
import (
"crazyfs/config"
)
func createCrazyfsIndex(cfg *config.Config) {
// Check if index exists
res, err := ElasticClient.Indices.Exists([]string{cfg.ElasticsearchIndex})
if err != nil {
log.Fatalf("Error checking if index exists: %s", err)
}
defer res.Body.Close()
// If index does not exist, create it
if res.StatusCode == 401 {
log.Fatalln("ELASTIC - Failed to create a new index: got code 401.")
} else if res.StatusCode == 404 {
res, err = ElasticClient.Indices.Create(cfg.ElasticsearchIndex)
if err != nil {
log.Fatalf("Error creating index: %s", err)
}
defer res.Body.Close()
if res.IsError() {
log.Printf("Error creating index: %s", res.String())
}
log.Infof(`Created a new index named "%s"`, cfg.ElasticsearchIndex)
}
}

83
src/elastic/list.go Normal file
View File

@ -0,0 +1,83 @@
package elastic
import (
"context"
"crazyfs/config"
"encoding/json"
"errors"
"fmt"
"github.com/elastic/go-elasticsearch/v8/esapi"
"time"
)
func getPathsFromIndex(cfg *config.Config) ([]string, error) {
// This may take a bit if the index is very large, so avoid calling this.
// Print a debug message so the user doesn't think we're frozen.
log.Debugln("Fetching indexed paths from Elasticsearch...")
var paths []string
var r map[string]interface{}
res, err := ElasticClient.Search(
ElasticClient.Search.WithContext(context.Background()),
ElasticClient.Search.WithIndex(cfg.ElasticsearchIndex),
ElasticClient.Search.WithScroll(time.Minute),
ElasticClient.Search.WithSize(1000),
)
if err != nil {
msg := fmt.Sprintf("Error getting response: %s", err)
return nil, errors.New(msg)
}
defer res.Body.Close()
if err := json.NewDecoder(res.Body).Decode(&r); err != nil {
msg := fmt.Sprintf("Error parsing the response body: %s", err)
return nil, errors.New(msg)
}
for {
scrollID := r["_scroll_id"].(string)
hits := r["hits"].(map[string]interface{})["hits"].([]interface{})
// Break after no more documents
if len(hits) == 0 {
break
}
// Iterate the document "hits" returned by API call
for _, hit := range hits {
doc := hit.(map[string]interface{})["_source"].(map[string]interface{})
path, ok := doc["path"].(string)
if ok {
paths = append(paths, path)
}
}
// Next scroll
res, err = ElasticClient.Scroll(ElasticClient.Scroll.WithScrollID(scrollID), ElasticClient.Scroll.WithScroll(time.Minute))
if err != nil {
msg := fmt.Sprintf("Error getting response: %s", err)
return nil, errors.New(msg)
}
defer res.Body.Close()
if err := json.NewDecoder(res.Body).Decode(&r); err != nil {
msg := fmt.Sprintf("Error getting response: %s", err)
return nil, errors.New(msg)
}
}
// Clear the scroll
clearScrollRequest := esapi.ClearScrollRequest{
ScrollID: []string{r["_scroll_id"].(string)},
}
clearScrollResponse, err := clearScrollRequest.Do(context.Background(), ElasticClient)
if err != nil {
msg := fmt.Sprintf("Error clearing scroll: %s", err)
return nil, errors.New(msg)
}
defer clearScrollResponse.Body.Close()
return paths, nil
}

70
src/elastic/search.go Normal file
View File

@ -0,0 +1,70 @@
package elastic
import (
"context"
"crazyfs/config"
"errors"
"fmt"
"github.com/elastic/go-elasticsearch/v8/esapi"
"github.com/mitchellh/mapstructure"
"strings"
)
func Search(query string, exclude []string, cfg *config.Config) (*esapi.Response, error) {
log.Debugf(`ELASTIC - Query: "%s"`, query)
var excludeQuery string
if len(exclude) > 0 {
var excludeConditions []string
for _, e := range exclude {
excludeConditions = append(excludeConditions, fmt.Sprintf(`{"query_string": {"query": "%s"}}`, e))
}
excludeQuery = fmt.Sprintf(`, "must_not": [%s]`, strings.Join(excludeConditions, ","))
}
esQuery := fmt.Sprintf(`{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "%s",
"default_operator": "and"
}
}%s
}
}
}`, query, excludeQuery)
return ElasticClient.Search(
ElasticClient.Search.WithContext(context.Background()),
ElasticClient.Search.WithIndex(cfg.ElasticsearchIndex),
ElasticClient.Search.WithBody(strings.NewReader(esQuery)),
ElasticClient.Search.WithTrackTotalHits(true),
ElasticClient.Search.WithPretty(),
ElasticClient.Search.WithSize(cfg.ApiSearchMaxResults),
)
}
type ErrorReason struct {
Reason string `mapstructure:"reason"`
}
type ErrorRootCause struct {
Causes []ErrorReason `mapstructure:"root_cause"`
}
type SearchError struct {
Error ErrorRootCause `mapstructure:"error"`
}
func GetSearchFailureReason(respData map[string]interface{}) (string, error) {
var data SearchError
err := mapstructure.Decode(respData, &data)
if err != nil {
return "", err
}
if len(data.Error.Causes) > 0 {
return data.Error.Causes[0].Reason, nil
}
return "", errors.New("no root cause found")
}

75
src/file/file.go Normal file
View File

@ -0,0 +1,75 @@
package file
import (
"crazyfs/config"
"crazyfs/logging"
"github.com/gabriel-vasile/mimetype"
"github.com/sirupsen/logrus"
"mime"
"os"
"path/filepath"
"strings"
)
var log *logrus.Logger
func init() {
log = logging.GetLogger()
}
func GetMimeType(path string, analyze bool, passedInfo *os.FileInfo) (bool, string, string, error) {
var MIME *mimetype.MIME
var mimeType string
var ext string
var err error
var info os.FileInfo
if config.FollowSymlinks {
info, err = os.Lstat(path)
} else {
if info == nil {
info, err = os.Stat(path)
} else {
info = *passedInfo
}
}
//if config.FollowSymlinks {
// info, err = os.Stat(path)
//} else {
if err != nil {
// File does not exist
return false, "", "", err
}
if !info.IsDir() {
if info.Mode()&os.ModeSymlink != 0 && !config.FollowSymlinks {
return false, "", "", nil
}
ext = filepath.Ext(path)
if analyze {
MIME, err = mimetype.DetectFile(path)
if err != nil {
log.Warnf("Error analyzing MIME type: %v", err)
return false, "", "", err
}
mimeType = MIME.String()
} else {
mimeType = mime.TypeByExtension(ext)
}
} else {
return true, "", ext, nil
}
if strings.Contains(mimeType, ";") {
mimeType = strings.Split(mimeType, ";")[0]
}
return true, mimeType, ext, nil
}
func StripRootDir(path string) string {
if path == "/" || path == config.RootDir || path == "" {
// Avoid erasing our path
return "/"
} else {
return strings.TrimSuffix(strings.TrimPrefix(path, config.RootDir), "/")
}
}

View File

@ -2,7 +2,6 @@ package file
import ( import (
"bytes" "bytes"
"errors"
"fmt" "fmt"
"github.com/chai2010/webp" "github.com/chai2010/webp"
"github.com/joway/libimagequant-go/pngquant" "github.com/joway/libimagequant-go/pngquant"
@ -11,71 +10,57 @@ import (
"image/jpeg" "image/jpeg"
"image/png" "image/png"
"io" "io"
"log"
"os" "os"
) )
func ConvertToPNG(filename string, contentType string) ([]byte, error) { func ConvertToPNG(filename string, contentType string) ([]byte, error) {
imageBytes, err := os.Open(filename) imageFile, err := os.Open(filename)
if err != nil { if err != nil {
log.Fatal(err) return nil, fmt.Errorf("failed to open file: %w", err)
} }
defer imageBytes.Close() defer imageFile.Close()
imageBytes, err := io.ReadAll(imageFile)
if err != nil {
return nil, fmt.Errorf("failed to read file: %w", err)
}
var img image.Image
switch contentType { switch contentType {
case "image/png": case "image/png":
imageBytes, err := io.ReadAll(imageBytes)
if err != nil {
return nil, errors.New("unable to read png")
}
return imageBytes, nil return imageBytes, nil
case "image/jpeg": case "image/jpeg":
img, err := jpeg.Decode(imageBytes) img, err = jpeg.Decode(bytes.NewReader(imageBytes))
if err != nil {
return nil, errors.New("unable to decode jpeg")
}
buf := new(bytes.Buffer)
if err := png.Encode(buf, img); err != nil {
return nil, errors.New("unable to encode png")
}
return buf.Bytes(), nil
case "image/webp": case "image/webp":
img, err := webp.Decode(imageBytes) img, err = webp.Decode(bytes.NewReader(imageBytes))
if err != nil {
return nil, errors.New("unable to decode webp")
}
buf := new(bytes.Buffer)
if err := png.Encode(buf, img); err != nil {
return nil, errors.New("unable to encode png")
}
case "image/gif": case "image/gif":
img, err := gif.Decode(imageBytes) img, err = gif.Decode(bytes.NewReader(imageBytes))
if err != nil { default:
return nil, errors.New("unable to decode gif") return nil, fmt.Errorf("unsupported content type: %s", contentType)
}
buf := new(bytes.Buffer)
if err := png.Encode(buf, img); err != nil {
return nil, errors.New("unable to encode png")
}
return buf.Bytes(), nil
} }
return nil, errors.New(fmt.Sprintf("unable to convert %#v to png", contentType)) if err != nil {
return nil, fmt.Errorf("failed to decode image: %w", err)
}
buf := new(bytes.Buffer)
if err := png.Encode(buf, img); err != nil {
return nil, fmt.Errorf("failed to encode image: %w", err)
}
return buf.Bytes(), nil
} }
func CompressPNGFile(inputImg image.Image, quality int) (*bytes.Buffer, error) { func CompressPNGFile(inputImg image.Image, quality int) (*bytes.Buffer, error) {
// Compress the image using pngquant
compressedImg, err := pngquant.Compress(inputImg, quality, pngquant.SPEED_FASTEST) compressedImg, err := pngquant.Compress(inputImg, quality, pngquant.SPEED_FASTEST)
if err != nil { if err != nil {
return nil, err return nil, fmt.Errorf("failed to compress image: %w", err)
} }
// Create a bytes.Buffer and encode the compressed image into it
buf := new(bytes.Buffer) buf := new(bytes.Buffer)
err = (&png.Encoder{CompressionLevel: png.BestCompression}).Encode(buf, compressedImg) err = (&png.Encoder{CompressionLevel: png.BestCompression}).Encode(buf, compressedImg)
//err = png.Encode(buf, compressedImg)
if err != nil { if err != nil {
return nil, err return nil, fmt.Errorf("failed to encode image: %w", err)
} }
return buf, nil return buf, nil

71
src/file/path.go Normal file
View File

@ -0,0 +1,71 @@
package file
import (
"crazyfs/config"
"fmt"
"os"
"path/filepath"
"strings"
)
// SafeJoin Clean the provided path
func SafeJoin(pathArg string) (string, error) {
cleanPath := filepath.Join(config.RootDir, filepath.Clean(pathArg))
cleanPath = strings.TrimRight(cleanPath, "/")
return cleanPath, nil
}
func DetectTraversal(pathArg string) (bool, error) {
// Remove the trailing slash so our checks always handle the same format
if pathArg != "/" {
pathArg = strings.TrimRight(pathArg, "/")
}
// If the path starts with "~", a directory traversal attack is being attempted
if strings.HasPrefix(pathArg, "~") {
return true, fmt.Errorf("includes home directory: %s", pathArg)
}
// The file path should ALWAYS be absolute.
// For example: /Documents
if !filepath.IsAbs(pathArg) {
return true, fmt.Errorf("is not absolute path: %s", pathArg)
}
cleanArg := filepath.Clean(pathArg)
cleanPath := filepath.Join(config.RootDir, cleanArg)
// If the path is not within the base path, return an error
if !strings.HasPrefix(cleanPath, config.RootDir) {
return true, fmt.Errorf("the full path is outside the root dir: %s", pathArg)
}
// If the cleaned path is not the same as the original path, a directory traversal attack is being attempted
if pathArg != cleanArg {
return true, fmt.Errorf("path. Clean modified the path arg from %s to %s", pathArg, cleanArg)
}
return false, nil
}
func PathExists(path string) (bool, error) {
fileInfo, err := os.Lstat(path)
if err != nil {
if os.IsNotExist(err) {
return false, nil // File or symlink does not exist
}
return false, err // Other error
}
if fileInfo.Mode()&os.ModeSymlink != 0 {
_, err := os.Stat(path)
if err != nil {
if os.IsNotExist(err) {
return false, nil // Symlink is broken
}
return false, err // Other error
}
}
return true, nil // File or symlink exists and is not broken
}

View File

@ -1,207 +0,0 @@
package file
import (
"archive/zip"
"compress/flate"
"crazyfs/api/helpers"
"crazyfs/cache"
"crazyfs/config"
"crazyfs/data"
"encoding/json"
lru "github.com/hashicorp/golang-lru/v2"
kzip "github.com/klauspost/compress/zip"
"io"
"net/http"
"os"
"path/filepath"
)
func ZipHandler(dirPath string, w http.ResponseWriter, r *http.Request, compressionLevel int) {
// The compressionLevel parameter should be a value between -2 and 9 inclusive, where -2 means default compression, 1 means best speed, and 9 means best compression.
// Set to 0 to disable compression (store mode)
// You need to write the headers and status code before any bytes
w.Header().Set("Content-Type", "application/zip")
// the filename which will be suggested in the save file dialog
w.WriteHeader(http.StatusOK)
zipWriter := zip.NewWriter(w)
// Set the compression level
if compressionLevel > 0 {
zipWriter.RegisterCompressor(zip.Deflate, func(out io.Writer) (io.WriteCloser, error) {
return flate.NewWriter(out, compressionLevel)
})
}
// Walk through the directory and add each file to the zip
filepath.Walk(dirPath, func(filePath string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}
// Ensure the file path is relative to the directory being zipped
relativePath, err := filepath.Rel(dirPath, filePath)
if err != nil {
return err
}
header, err := zip.FileInfoHeader(info)
if err != nil {
return err
}
header.Name = relativePath
if compressionLevel > 0 {
header.Method = zip.Deflate
} else {
header.Method = zip.Store
}
writer, err := zipWriter.CreateHeader(header)
if err != nil {
return err
}
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(writer, file)
return err
})
err := zipWriter.Close()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
func ZipHandlerCompress(dirPath string, w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/zip")
w.WriteHeader(http.StatusOK)
zipWriter := kzip.NewWriter(w)
// Walk through the directory and add each file to the zip
filepath.Walk(dirPath, func(filePath string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}
// Ensure the file path is relative to the directory being zipped
relativePath, err := filepath.Rel(dirPath, filePath)
if err != nil {
return err
}
writer, err := zipWriter.Create(relativePath)
if err != nil {
return err
}
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(writer, file)
return err
})
err := zipWriter.Close()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}
func ZipHandlerCompressMultiple(paths []string, w http.ResponseWriter, r *http.Request, cfg *config.Config, sharedCache *lru.Cache[string, *data.Item]) {
zipWriter := kzip.NewWriter(w)
// Walk through each file and add it to the zip
for _, path := range paths {
relPath := cache.StripRootDir(filepath.Join(cfg.RootDir, path), cfg.RootDir)
fullPath := filepath.Join(cfg.RootDir, relPath)
// Check if the path is in the restricted download paths
for _, restrictedPath := range cfg.RestrictedDownloadPaths {
if relPath == restrictedPath {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusForbidden)
json.NewEncoder(w).Encode(map[string]interface{}{
"code": http.StatusForbidden,
"error": "not allowed to download this path",
})
return
}
}
// Try to get the data from the cache
item, found := sharedCache.Get(relPath)
if !found {
item = helpers.HandleFileNotFound(relPath, fullPath, sharedCache, cfg, w)
}
if item == nil {
// The errors have already been handled in handleFileNotFound() so we're good to just exit
return
}
if !item.IsDir {
writer, err := zipWriter.Create(relPath)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
file, err := os.Open(fullPath)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
defer file.Close()
_, err = io.Copy(writer, file)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
} else {
w.Header().Set("Content-Disposition", `attachment; filename="files.zip"`)
w.Header().Set("Content-Type", "application/zip")
w.WriteHeader(http.StatusOK)
// If it's a directory, walk through it and add each file to the zip
filepath.Walk(fullPath, func(filePath string, info os.FileInfo, err error) error {
if info.IsDir() {
return nil
}
// Ensure the file path is relative to the directory being zipped
relativePath, err := filepath.Rel(fullPath, filePath)
if err != nil {
return err
}
writer, err := zipWriter.Create(filepath.Join(relPath, relativePath))
if err != nil {
return err
}
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()
_, err = io.Copy(writer, file)
return err
})
}
}
err := zipWriter.Close()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
}
}

View File

@ -5,11 +5,13 @@ go 1.20
require ( require (
github.com/chai2010/webp v1.1.1 github.com/chai2010/webp v1.1.1
github.com/disintegration/imaging v1.6.2 github.com/disintegration/imaging v1.6.2
github.com/elastic/go-elasticsearch/v8 v8.11.1
github.com/gabriel-vasile/mimetype v1.4.2 github.com/gabriel-vasile/mimetype v1.4.2
github.com/gorilla/mux v1.8.0 github.com/gorilla/mux v1.8.0
github.com/hashicorp/golang-lru/v2 v2.0.4 github.com/hashicorp/golang-lru/v2 v2.0.4
github.com/joway/libimagequant-go v0.1.0 github.com/joway/libimagequant-go v0.1.0
github.com/klauspost/compress v1.16.7 github.com/klauspost/compress v1.16.7
github.com/mitchellh/mapstructure v1.5.0
github.com/nfnt/resize v0.0.0-20180221191011-83c6a9932646 github.com/nfnt/resize v0.0.0-20180221191011-83c6a9932646
github.com/radovskyb/watcher v1.0.7 github.com/radovskyb/watcher v1.0.7
github.com/sirupsen/logrus v1.9.3 github.com/sirupsen/logrus v1.9.3
@ -17,10 +19,10 @@ require (
) )
require ( require (
github.com/elastic/elastic-transport-go/v8 v8.3.0 // indirect
github.com/fsnotify/fsnotify v1.6.0 // indirect github.com/fsnotify/fsnotify v1.6.0 // indirect
github.com/hashicorp/hcl v1.0.0 // indirect github.com/hashicorp/hcl v1.0.0 // indirect
github.com/magiconair/properties v1.8.7 // indirect github.com/magiconair/properties v1.8.7 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/pelletier/go-toml/v2 v2.0.8 // indirect github.com/pelletier/go-toml/v2 v2.0.8 // indirect
github.com/pkg/errors v0.9.1 // indirect github.com/pkg/errors v0.9.1 // indirect
github.com/spf13/afero v1.9.5 // indirect github.com/spf13/afero v1.9.5 // indirect
@ -30,7 +32,7 @@ require (
github.com/subosito/gotenv v1.4.2 // indirect github.com/subosito/gotenv v1.4.2 // indirect
golang.org/x/image v0.0.0-20211028202545-6944b10bf410 // indirect golang.org/x/image v0.0.0-20211028202545-6944b10bf410 // indirect
golang.org/x/net v0.10.0 // indirect golang.org/x/net v0.10.0 // indirect
golang.org/x/sys v0.8.0 // indirect golang.org/x/sys v0.10.0 // indirect
golang.org/x/text v0.9.0 // indirect golang.org/x/text v0.9.0 // indirect
gopkg.in/ini.v1 v1.67.0 // indirect gopkg.in/ini.v1 v1.67.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect gopkg.in/yaml.v3 v3.0.1 // indirect

View File

@ -53,6 +53,10 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/disintegration/imaging v1.6.2 h1:w1LecBlG2Lnp8B3jk5zSuNqd7b4DXhcjwek1ei82L+c= github.com/disintegration/imaging v1.6.2 h1:w1LecBlG2Lnp8B3jk5zSuNqd7b4DXhcjwek1ei82L+c=
github.com/disintegration/imaging v1.6.2/go.mod h1:44/5580QXChDfwIclfc/PCwrr44amcmDAg8hxG0Ewe4= github.com/disintegration/imaging v1.6.2/go.mod h1:44/5580QXChDfwIclfc/PCwrr44amcmDAg8hxG0Ewe4=
github.com/elastic/elastic-transport-go/v8 v8.3.0 h1:DJGxovyQLXGr62e9nDMPSxRyWION0Bh6d9eCFBriiHo=
github.com/elastic/elastic-transport-go/v8 v8.3.0/go.mod h1:87Tcz8IVNe6rVSLdBux1o/PEItLtyabHU3naC7IoqKI=
github.com/elastic/go-elasticsearch/v8 v8.11.1 h1:1VgTgUTbpqQZ4uE+cPjkOvy/8aw1ZvKcU0ZUE5Cn1mc=
github.com/elastic/go-elasticsearch/v8 v8.11.1/go.mod h1:GU1BJHO7WeamP7UhuElYwzzHtvf9SDmeVpSSy9+o6Qg=
github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98= github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98=
@ -334,8 +338,8 @@ golang.org/x/sys v0.0.0-20210423185535-09eb48e85fd7/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220908164124-27713097b956/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220908164124-27713097b956/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0 h1:EBmGv8NaZBZTWvrbjNoL6HVt+IVy3QDQpJs7VRIw3tU= golang.org/x/sys v0.10.0 h1:SqMFp9UcQJZa+pmYuAKjd9xq1f0j5rLcDIk0mj4qAsA=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=

5
todo.txt Normal file
View File

@ -0,0 +1,5 @@
- Add a wildcard option to restricted_download_paths to block all sub-directories
- Add a dict to each restricted_download_paths item to specify how many levels recursive the block should be applied
- Add an endpoint to return restricted_download_paths so the frontend can block downloads for those folders
- Load the config into a global variable and stop passing it as function args
- Remove the file change watcher mode