0f346a3296
* Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels Performance and accuracy of these kernels are on par (tested with Llama 70B and 405B). Removes a dependency and resolves some stability issues we have been seeing. * Update test snapshots |
||
---|---|---|
.. | ||
client.nix | ||
crate-overrides.nix | ||
docker.nix | ||
impure-shell.nix | ||
overlay.nix | ||
server.nix |