hf_text-generation-inference

History

Dmitry Rogozhkin 58848cb471 feat: enable pytorch xpu support for non-attention models (#2561 ) XPU backend is available natively (without IPEX) in pytorch starting from pytorch 2.4. This commit extends TGI to cover the case when user has XPU support thru pytorch 2.4, but does not have IPEX installed. Models which don't require attention can work. For attention required models more work is needed to provide attention implementation. Tested with the following models: * teknium/OpenHermes-2.5-Mistral-7B * bigscience/bloom-560m * google/gemma-7b * google/flan-t5-xxl Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>		2024-10-14 18:28:49 +02:00
..
merges	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
__init__.py	feat(server): Add native support for PEFT Lora models (#762 )	2023-08-03 17:22:45 +02:00
adapter.py	Micro cleanup. (#2555 )	2024-09-24 11:19:24 +02:00
chunks.py	server: use chunked inputs	2024-06-07 08:09:04 +02:00
convert.py	Force weights_only (before fully breaking pickle files anyway). (#1710 )	2024-04-05 19:23:57 +02:00
dist.py	feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248 )	2024-07-20 19:02:04 +02:00
hub.py	Micro cleanup. (#2555 )	2024-09-24 11:19:24 +02:00
import_utils.py	feat: enable pytorch xpu support for non-attention models (#2561 )	2024-10-14 18:28:49 +02:00
log.py	feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248 )	2024-07-20 19:02:04 +02:00
logits_process.py	patch-error-on-invalid-grammar (#2282 )	2024-07-29 10:09:25 -04:00
peft.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
quantization.py	Handle GPTQ-Marlin loading in `GPTQMarlinWeightLoader` (#2300 )	2024-07-31 13:08:41 +02:00
segments.py	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
sgmv.py	fix: allocate tmp based on sgmv kernel if available (#2345 )	2024-08-12 17:24:32 +02:00
speculate.py	chore: formatting	2023-12-11 14:49:52 +01:00
tokens.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
watermark.py	Fixing watermark. (#851 )	2023-08-16 07:17:26 +02:00
weights.py	Move to moe-kernels package and switch to common MoE layer (#2511 )	2024-09-17 18:08:58 +02:00