* Add `impureWithCuda` dev shell
This shell is handy when developing some kernels jointly with TGI - it
adds nvcc and a bunch of commonly-used CUDA libraries to the environment.
We don't add this to the normal impure shell to keep the development
environment as clean as possible (avoid accidental dependencies, etc.).
* Add cuDNN
To make sure that everything is formatted with the same black version
as CI.
I sometimes use isort for new files to get nicely ordered imports,
so add it as well. Also set the isort configuration to format in a
way that is compatible with black.
* Improve support for GPUs with capability < 8
- For models that cannot use flashinfer, use flash-attn v1 + paged
attention for models with a compute capability older than 8.
- Disable prefix caching when using paged attention.
- When using flash-attn v1, pass the key/value, rather than the
cache, since v1 cannot use block tables.
* nix: add flash-attn-v1 to the server environment
* Move disabling prefix caching into the block of exceptions
* Capability as `usize`s