Safetensors conceptual guide (#905)
IDK what else to add in this guide, I looked for relevant code in TGI codebase and saw that it's used in quantization as well (maybe I could add that?)
This commit is contained in:
parent
b03d2621a7
commit
af1ed38f39
|
@ -21,6 +21,8 @@
|
|||
- sections:
|
||||
- local: conceptual/streaming
|
||||
title: Streaming
|
||||
- local: conceptual/safetensors
|
||||
title: Safetensors
|
||||
- local: conceptual/flash_attention
|
||||
title: Flash Attention
|
||||
title: Conceptual Guides
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
# Safetensors
|
||||
|
||||
Safetensors is a model serialization format for deep learning models. It is [faster](https://huggingface.co/docs/safetensors/speed) and safer compared to other serialization formats like pickle (which is used under the hood in many deep learning libraries).
|
||||
|
||||
TGI depends on safetensors format mainly to enable [tensor parallelism sharding](./tensor_parallelism). For a given model repository during serving, TGI looks for safetensors weights. If there are no safetensors weights, TGI converts the PyTorch weights to safetensors format.
|
||||
|
||||
You can learn more about safetensors by reading the [safetensors documentation](https://huggingface.co/docs/safetensors/index).
|
Loading…
Reference in New Issue