hf_text-generation-inference

History

drbh 01dacf8e8f fix cuda graphs for qwen2-vl (#2708 ) * feat: support multidimensional position ids on batch to enable cuda graphs on qwen2-vl * fix: only check model type if config exists * fix: adjust sharding and lm head logic * fix qwen2 failure in intel cpu Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fix: return correct shape logits and add streaming test * fix: remove unused import and refactor test --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>		2024-11-01 03:05:34 +01:00
..
images	Pali gemma modeling (#1895 )	2024-05-16 06:58:47 +02:00
models	fix cuda graphs for qwen2-vl (#2708 )	2024-11-01 03:05:34 +01:00
conftest.py	Monkey patching as a desperate measure. (#2704 )	2024-10-28 11:25:13 +01:00
poetry.lock	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-11 18:10:40 +02:00
pyproject.toml	nix: add black and isort to the closure (#2619 )	2024-10-09 11:08:02 +02:00
pytest.ini	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
requirements.txt	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-11 18:10:40 +02:00