hf_text-generation-inference/docs/source
Nicolas Patry 0c9b6cdd76
Choosing input/total tokens automatically based on available VRAM? (#2673)
* Choosing input/total tokens automatically based on available VRAM?

* Update doc.

* Remove generated files.

* Trying to fix non chunking targets.

* Attempt #2

* fix.

* QuantLinear is rocm compatible.

* Much simpler logic after the overhead.

* Updating logic + non flash.

* Revert doc text.

* Simple updates.

* Fix integration mt0 (transformers update).
2024-10-28 04:59:49 +01:00
..
basic_tutorials chore: prepare 2.4.0 release (#2695) 2024-10-25 21:10:49 +00:00
conceptual chore: prepare 2.4.0 release (#2695) 2024-10-25 21:10:49 +00:00
reference Choosing input/total tokens automatically based on available VRAM? (#2673) 2024-10-28 04:59:49 +01:00
_toctree.yml Small fixes for supported models (#2471) 2024-10-14 15:26:39 +02:00
architecture.md Update architecture.md (#2577) 2024-09-30 08:56:20 +02:00
index.md fix typos in docs and add small clarifications (#1790) 2024-04-22 12:15:48 -04:00
installation.md MI300 compatibility (#1764) 2024-05-17 15:30:47 +02:00
installation_amd.md chore: prepare 2.4.0 release (#2695) 2024-10-25 21:10:49 +00:00
installation_gaudi.md MI300 compatibility (#1764) 2024-05-17 15:30:47 +02:00
installation_inferentia.md MI300 compatibility (#1764) 2024-05-17 15:30:47 +02:00
installation_intel.md chore: prepare 2.4.0 release (#2695) 2024-10-25 21:10:49 +00:00
installation_nvidia.md chore: prepare 2.4.0 release (#2695) 2024-10-25 21:10:49 +00:00
quicktour.md chore: prepare 2.4.0 release (#2695) 2024-10-25 21:10:49 +00:00
supported_models.md feat: natively support Granite models (#2682) 2024-10-23 10:04:05 +00:00
usage_statistics.md feat: allow any supported payload on /invocations (#2683) 2024-10-23 11:26:01 +00:00