Misc improvements for InferenceClient docs (#852)

List of changes

- No need to specify `model` in `text_generation` if it's already
specified in `InferenceClient`
- I separated the explanation of `stream=True` and `details=True`
- I found the details explanation a bit repetitive (it says two times
what it returns), so removed a sentence
- Add mention of async client
This commit is contained in:
Omar Sanseviero 2023-08-16 14:29:54 +02:00 committed by GitHub
parent 2774b0ab44
commit d9bceb8e6b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 13 additions and 7 deletions

View File

@ -17,7 +17,6 @@ curl 127.0.0.1:8080/generate \
## Inference Client ## Inference Client
[`huggingface-hub`](https://huggingface.co/docs/huggingface_hub/main/en/index) is a Python library to interact with the Hugging Face Hub, including its endpoints. It provides a nice high-level class, [`~huggingface_hub.InferenceClient`], which makes it easy to make calls to a TGI endpoint. `InferenceClient` also takes care of parameter validation and provides a simple to-use interface. [`huggingface-hub`](https://huggingface.co/docs/huggingface_hub/main/en/index) is a Python library to interact with the Hugging Face Hub, including its endpoints. It provides a nice high-level class, [`~huggingface_hub.InferenceClient`], which makes it easy to make calls to a TGI endpoint. `InferenceClient` also takes care of parameter validation and provides a simple to-use interface.
You can simply install `huggingface-hub` package with pip. You can simply install `huggingface-hub` package with pip.
```bash ```bash
@ -29,14 +28,21 @@ Once you start the TGI server, instantiate `InferenceClient()` with the URL to t
```python ```python
from huggingface_hub import InferenceClient from huggingface_hub import InferenceClient
client = InferenceClient(model=URL_TO_ENDPOINT_SERVING_TGI) client = InferenceClient(model="http://127.0.0.1:8080")
client.text_generation(prompt="Write a code for snake game", model=URL_TO_ENDPOINT_SERVING_TGI) client.text_generation(prompt="Write a code for snake game")
``` ```
To stream tokens in `InferenceClient`, simply pass `stream=True`. Another parameter you can use with TGI backend is `details`. You can get more details on generation (tokens, probabilities, etc.) by setting `details` to `True`. By default, `details` is set to `False`, and `text_generation` returns a string. If you pass `details=True` and `stream=True`, `text_generation` will return a `TextGenerationStreamResponse` which consists of the generated token, generated text, and details. You can do streaming with `InferenceClient` by passing `stream=True`. Streaming will return tokens as they are being generated in the server. To use streaming, you can do as follows:
```python ```python
output = client.text_generation(prompt="Meaning of life is", model=URL_OF_ENDPOINT, details=True) for token in client.text_generation("How do you make cheese?", max_new_tokens=12, stream=True):
print(token)
```
Another parameter you can use with TGI backend is `details`. You can get more details on generation (tokens, probabilities, etc.) by setting `details` to `True`. When it's specified, TGI will return a `TextGenerationResponse` or `TextGenerationStreamResponse` rather than a string or stream.
```python
output = client.text_generation(prompt="Meaning of life is", details=True)
print(output) print(output)
# TextGenerationResponse(generated_text=' a complex concept that is not always clear to the individual. It is a concept that is not always', details=Details(finish_reason=<FinishReason.Length: 'length'>, generated_tokens=20, seed=None, prefill=[], tokens=[Token(id=267, text=' a', logprob=-2.0723474, special=False), Token(id=11235, text=' complex', logprob=-3.1272552, special=False), Token(id=17908, text=' concept', logprob=-1.3632495, special=False),..)) # TextGenerationResponse(generated_text=' a complex concept that is not always clear to the individual. It is a concept that is not always', details=Details(finish_reason=<FinishReason.Length: 'length'>, generated_tokens=20, seed=None, prefill=[], tokens=[Token(id=267, text=' a', logprob=-2.0723474, special=False), Token(id=11235, text=' complex', logprob=-3.1272552, special=False), Token(id=17908, text=' concept', logprob=-1.3632495, special=False),..))
@ -45,13 +51,13 @@ print(output)
You can see how to stream below. You can see how to stream below.
```python ```python
output = client.text_generation(prompt="Meaning of life is", model="http://localhost:3000/", stream=True, details=True) output = client.text_generation(prompt="Meaning of life is", stream=True, details=True)
print(next(iter(output))) print(next(iter(output)))
# TextGenerationStreamResponse(token=Token(id=267, text=' a', logprob=-2.0723474, special=False), generated_text=None, details=None) # TextGenerationStreamResponse(token=Token(id=267, text=' a', logprob=-2.0723474, special=False), generated_text=None, details=None)
``` ```
You can check out the details of the function [here](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation). You can check out the details of the function [here](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation). There is also an async version of the client, `AsyncInferenceClient`, based on `asyncio` and `aiohttp`. You can find docs for it [here](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.AsyncInferenceClient)
## ChatUI ## ChatUI