This website requires JavaScript.
Explore
Gist
Help
Register
Sign In
Mirrors
/
hf_text-generation-inference
mirror of
https://github.com/huggingface/text-generation-inference.git
Watch
1
Star
0
Fork
You've already forked hf_text-generation-inference
0
Code
Issues
Packages
Projects
Releases
Wiki
Activity
feature/prefix
hf_text-generation-inference
/
server
/
text_generation_server
/
layers
/
attention
History
Nicolas Patry
dea9c0dc74
Fixing rocm. (
#2164
)
2024-07-02 12:01:08 +02:00
..
__init__.py
[Major Change][Undecided yet] Move to FlashDecoding instead of PagedAttention kernel. (
#1940
)
2024-07-01 23:28:00 +02:00
common.py
[Major Change][Undecided yet] Move to FlashDecoding instead of PagedAttention kernel. (
#1940
)
2024-07-01 23:28:00 +02:00
cuda.py
[Major Change][Undecided yet] Move to FlashDecoding instead of PagedAttention kernel. (
#1940
)
2024-07-01 23:28:00 +02:00
flash_attn_triton.py
Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (
#1986
)
2024-05-31 17:57:01 +02:00
ipex.py
fix FlashDecoding change's regression in intel platform (
#2161
)
2024-07-02 11:56:07 +02:00
rocm.py
Fixing rocm. (
#2164
)
2024-07-02 12:01:08 +02:00