hf_text-generation-inference

Author	SHA1	Message	Date
fxmarty	b2b5df0e94	Add RoCm support (#1243 ) This PR adds support for AMD Instinct MI210 & MI250 GPUs, with paged attention and FAv2 support. Remaining items to discuss, on top of possible others: * Should we have a `ghcr.io/huggingface/text-generation-inference:1.1.0+rocm` hosted image, or is it too early? * Should we set up a CI on MI210/MI250? I don't have access to the runners of TGI though. * Are we comfortable with those changes being directly in TGI, or do we need a fork? --------- Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: OlivierDehaene <olivier@huggingface.co> Co-authored-by: Your Name <you@example.com>	2023-11-27 14:08:12 +01:00
OlivierDehaene	31e2253ae7	feat(server): use latest flash attention commit (#543 ) @njhill FYI	2023-07-04 20:23:55 +02:00
OlivierDehaene	5ce89059f8	feat(server): pre-allocate past key values for flash causal LM (#412 )	2023-06-12 18:30:29 +02:00
OlivierDehaene	53ee09c0b0	fea(dockerfile): better layer caching (#159 )	2023-04-14 10:12:21 +02:00
OlivierDehaene	1883d8ecde	feat(docker): improve flash_attention caching (#160 )	2023-04-09 19:59:16 +02:00