Commit Graph

5 Commits

Author SHA1 Message Date
fxmarty b2b5df0e94
Add RoCm support (#1243)
This PR adds support for AMD Instinct MI210 & MI250 GPUs, with paged
attention and FAv2 support.

Remaining items to discuss, on top of possible others:
* Should we have a
`ghcr.io/huggingface/text-generation-inference:1.1.0+rocm` hosted image,
or is it too early?
* Should we set up a CI on MI210/MI250? I don't have access to the
runners of TGI though.
* Are we comfortable with those changes being directly in TGI, or do we
need a fork?

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Co-authored-by: Your Name <you@example.com>
2023-11-27 14:08:12 +01:00
OlivierDehaene 31e2253ae7
feat(server): use latest flash attention commit (#543)
@njhill FYI
2023-07-04 20:23:55 +02:00
OlivierDehaene 5ce89059f8
feat(server): pre-allocate past key values for flash causal LM (#412) 2023-06-12 18:30:29 +02:00
OlivierDehaene 53ee09c0b0
fea(dockerfile): better layer caching (#159) 2023-04-14 10:12:21 +02:00
OlivierDehaene 1883d8ecde
feat(docker): improve flash_attention caching (#160) 2023-04-09 19:59:16 +02:00