This website requires JavaScript.
Explore
Gist
Help
Register
Sign In
Mirrors
/
hf_text-generation-inference
mirror of
https://github.com/huggingface/text-generation-inference.git
Watch
1
Star
0
Fork
You've already forked hf_text-generation-inference
0
Code
Issues
Packages
Projects
Releases
Wiki
Activity
41c2623735
hf_text-generation-inference
/
server
/
text_generation_server
/
layers
/
attention
History
Daniël de Kok
8ec57558cd
Break cycle between the attention implementations and KV cache (
#2627
)
2024-10-17 14:54:22 +02:00
..
__init__.py
Break cycle between the attention implementations and KV cache (
#2627
)
2024-10-17 14:54:22 +02:00
common.py
feat: prefill chunking (
#2600
)
2024-10-16 12:49:33 +02:00
cuda.py
Break cycle between the attention implementations and KV cache (
#2627
)
2024-10-17 14:54:22 +02:00
flash_attn_triton.py
feat: prefill chunking (
#2600
)
2024-10-16 12:49:33 +02:00
flashinfer.py
flashinfer: pass window size and dtype (
#2574
)
2024-09-28 18:41:41 +02:00
ipex.py
Break cycle between the attention implementations and KV cache (
#2627
)
2024-10-17 14:54:22 +02:00
kv_cache.py
Break cycle between the attention implementations and KV cache (
#2627
)
2024-10-17 14:54:22 +02:00
rocm.py
Break cycle between the attention implementations and KV cache (
#2627
)
2024-10-17 14:54:22 +02:00