hf_text-generation-inference

History

Nicolas Patry 0c9b6cdd76 Choosing input/total tokens automatically based on available VRAM? (#2673 ) * Choosing input/total tokens automatically based on available VRAM? * Update doc. * Remove generated files. * Trying to fix non chunking targets. * Attempt #2 * fix. * QuantLinear is rocm compatible. * Much simpler logic after the overhead. * Updating logic + non flash. * Revert doc text. * Simple updates. * Fix integration mt0 (transformers update).		2024-10-28 04:59:49 +01:00
..
basic_tutorials	chore: prepare 2.4.0 release (#2695 )	2024-10-25 21:10:49 +00:00
conceptual	chore: prepare 2.4.0 release (#2695 )	2024-10-25 21:10:49 +00:00
reference	Choosing input/total tokens automatically based on available VRAM? (#2673 )	2024-10-28 04:59:49 +01:00
_toctree.yml	Small fixes for supported models (#2471 )	2024-10-14 15:26:39 +02:00
architecture.md	Update architecture.md (#2577 )	2024-09-30 08:56:20 +02:00
index.md	fix typos in docs and add small clarifications (#1790 )	2024-04-22 12:15:48 -04:00
installation.md	MI300 compatibility (#1764 )	2024-05-17 15:30:47 +02:00
installation_amd.md	chore: prepare 2.4.0 release (#2695 )	2024-10-25 21:10:49 +00:00
installation_gaudi.md	MI300 compatibility (#1764 )	2024-05-17 15:30:47 +02:00
installation_inferentia.md	MI300 compatibility (#1764 )	2024-05-17 15:30:47 +02:00
installation_intel.md	chore: prepare 2.4.0 release (#2695 )	2024-10-25 21:10:49 +00:00
installation_nvidia.md	chore: prepare 2.4.0 release (#2695 )	2024-10-25 21:10:49 +00:00
quicktour.md	chore: prepare 2.4.0 release (#2695 )	2024-10-25 21:10:49 +00:00
supported_models.md	feat: natively support Granite models (#2682 )	2024-10-23 10:04:05 +00:00
usage_statistics.md	feat: allow any supported payload on /invocations (#2683 )	2024-10-23 11:26:01 +00:00