hf_text-generation-inference/server/text_generation_server/layers/attention
Wang, Yi 5da4cfab1c
refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform (#2132)
* refine get xpu free memory

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* enable qwen2 in xpu

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* enable gemma/gemma2/phi in intel platform

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-07-01 14:32:54 +02:00
..
__init__.py Removing IPEX_AVAIL. (#2115) 2024-06-25 13:20:57 +02:00
cuda.py Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (#1986) 2024-05-31 17:57:01 +02:00
flash_attn_triton.py Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (#1986) 2024-05-31 17:57:01 +02:00
ipex.py refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform (#2132) 2024-07-01 14:32:54 +02:00
rocm.py ROCm and sliding windows fixes (#2033) 2024-06-10 15:09:50 +08:00