Adding some docs.

2024-02-27 15:38:02 +01:00 · 2024-02-27 15:38:02 +01:00 · cea291718e
parent bf700e7eef
commit cea291718e
5 changed files with 9 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -52,6 +52,8 @@ Text Generation Inference (TGI) is a toolkit for deploying and serving Large Lan
 - Logits warper (temperature scaling, top-p, top-k, repetition penalty, more details see [transformers.LogitsProcessor](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.LogitsProcessor))
 - Stop sequences
 - Log probabilities
+- [Speculation](https://huggingface.co/docs/text-generation-inference/conceptual/speculation) ~2x latency
+- [Guidance/JSON](https://huggingface.co/docs/text-generation-inference/conceptual/guidance). Specify output format to speed up inference and make sure the output is valid according to some specs..
 - Custom Prompt Generation: Easily generate text by providing custom prompts to guide the model's output
 - Fine-tuning Support: Utilize fine-tuned models for specific tasks to achieve higher accuracy and performance

--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@ -37,4 +37,8 @@
    title: Safetensors
  - local: conceptual/flash_attention
    title: Flash Attention
+  - local: conceptual/speculation
+    title: Speculation (Medusa, ngram)
+  - local: conceptual/guidance
+    title: Guidance, JSON, tools (using outlines)
  title: Conceptual Guides
--- a/docs/source/conceptual/guidance.md
+++ b/docs/source/conceptual/guidance.md
@ -0,0 +1 @@
+## Guidance
--- a/docs/source/conceptual/speculation
+++ b/docs/source/conceptual/speculation
@ -0,0 +1 @@
+## Speculation
--- a/docs/source/conceptual/speculation.md
+++ b/docs/source/conceptual/speculation.md
@ -0,0 +1 @@
+## Speculation