72ab60fdd5
The compressed-tensors configuration can specify the configuration of the KV cache as well. Use an FP8 KV cache when the configuration tells us to do so (all other options and types are ignored for now). |
||
---|---|---|
.. | ||
adapters | ||
layers | ||
models | ||
pb | ||
utils | ||
__init__.py | ||
cache.py | ||
cli.py | ||
interceptor.py | ||
server.py | ||
tracing.py |