hf_text-generation-inference/clients/python/README.md

# Text Generation

The Hugging Face Text Generation Python library provides a convenient way of interfacing with a
`text-generation-inference` instance running on
[Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) or on the Hugging Face Hub.

## Get Started

### Install

```shell
pip install text-generation
```

### Inference API Usage

```python
from text_generation import InferenceAPIClient

client = InferenceAPIClient("bigscience/bloomz")
text = client.generate("Why is the sky blue?").generated_text
print(text)
# ' Rayleigh scattering'

# Token Streaming
text = ""
for response in client.generate_stream("Why is the sky blue?"):
    if not response.token.special:
        text += response.token.text

print(text)
# ' Rayleigh scattering'
```

or with the asynchronous client:

```python
from text_generation import InferenceAPIAsyncClient

client = InferenceAPIAsyncClient("bigscience/bloomz")
response = await client.generate("Why is the sky blue?")
print(response.generated_text)
# ' Rayleigh scattering'

# Token Streaming
text = ""
async for response in client.generate_stream("Why is the sky blue?"):
    if not response.token.special:
        text += response.token.text

print(text)
# ' Rayleigh scattering'
```

Check all currently deployed models on the Huggingface Inference API with `Text Generation` support:

```python
from text_generation.inference_api import deployed_models

print(deployed_models())
```

### Hugging Face Inference Endpoint usage

```python
from text_generation import Client

endpoint_url = "https://YOUR_ENDPOINT.endpoints.huggingface.cloud"

client = Client(endpoint_url)
text = client.generate("Why is the sky blue?").generated_text
print(text)
# ' Rayleigh scattering'

# Token Streaming
text = ""
for response in client.generate_stream("Why is the sky blue?"):
    if not response.token.special:
        text += response.token.text

print(text)
# ' Rayleigh scattering'
```

or with the asynchronous client:

```python
from text_generation import AsyncClient

endpoint_url = "https://YOUR_ENDPOINT.endpoints.huggingface.cloud"

client = AsyncClient(endpoint_url)
response = await client.generate("Why is the sky blue?")
print(response.generated_text)
# ' Rayleigh scattering'

# Token Streaming
text = ""
async for response in client.generate_stream("Why is the sky blue?"):
    if not response.token.special:
        text += response.token.text

print(text)
# ' Rayleigh scattering'
```

### Types

```python
# Prompt tokens
class PrefillToken:
    # Token ID from the model tokenizer
    id: int
    # Token text
    text: str
    # Logprob
    # Optional since the logprob of the first token cannot be computed
    logprob: Optional[float]


# Generated tokens
class Token:
    # Token ID from the model tokenizer
    id: int
    # Token text
    text: str
    # Logprob
    logprob: float
    # Is the token a special token
    # Can be used to ignore tokens when concatenating
    special: bool


# Generation finish reason
class FinishReason(Enum):
    # number of generated tokens == `max_new_tokens`
    Length = "length"
    # the model generated its end of sequence token
    EndOfSequenceToken = "eos_token"
    # the model generated a text included in `stop_sequences`
    StopSequence = "stop_sequence"


# Additional sequences when using the `best_of` parameter
class BestOfSequence:
    # Generated text
    generated_text: str
    # Generation finish reason
    finish_reason: FinishReason
    # Number of generated tokens
    generated_tokens: int
    # Sampling seed if sampling was activated
    seed: Optional[int]
    # Prompt tokens
    prefill: List[PrefillToken]
    # Generated tokens
    tokens: List[Token]


# `generate` details
class Details:
    # Generation finish reason
    finish_reason: FinishReason
    # Number of generated tokens
    generated_tokens: int
    # Sampling seed if sampling was activated
    seed: Optional[int]
    # Prompt tokens
    prefill: List[PrefillToken]
    # Generated tokens
    tokens: List[Token]
    # Additional sequences when using the `best_of` parameter
    best_of_sequences: Optional[List[BestOfSequence]]


# `generate` return value
class Response:
    # Generated text
    generated_text: str
    # Generation details
    details: Details


# `generate_stream` details
class StreamDetails:
    # Generation finish reason
    finish_reason: FinishReason
    # Number of generated tokens
    generated_tokens: int
    # Sampling seed if sampling was activated
    seed: Optional[int]


# `generate_stream` return value
class StreamResponse:
    # Generated token
    token: Token
    # Complete generated text
    # Only available when the generation is finished
    generated_text: Optional[str]
    # Generation details
    # Only available when the generation is finished
    details: Optional[StreamDetails]

# Inference API currently deployed model
class DeployedModel:
    model_id: str
    sha: str
```
feat(clients): Python client (#103) 2023-03-07 10:52:22 -07:00			`# Text Generation`

			`The Hugging Face Text Generation Python library provides a convenient way of interfacing with a`
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 03:06:59 -07:00			`text-generation-inference` instance running on
			`[Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) or on the Hugging Face Hub.`
feat(clients): Python client (#103) 2023-03-07 10:52:22 -07:00
			`## Get Started`

			`### Install`

			```shell
			`pip install text-generation`
			```

feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 03:06:59 -07:00			`### Inference API Usage`
feat(clients): Python client (#103) 2023-03-07 10:52:22 -07:00
			```python
			`from text_generation import InferenceAPIClient`

			`client = InferenceAPIClient("bigscience/bloomz")`
			`text = client.generate("Why is the sky blue?").generated_text`
			`print(text)`
			`# ' Rayleigh scattering'`

			`# Token Streaming`
			`text = ""`
			`for response in client.generate_stream("Why is the sky blue?"):`
			`if not response.token.special:`
			`text += response.token.text`

			`print(text)`
			`# ' Rayleigh scattering'`
			```

			`or with the asynchronous client:`

			```python
			`from text_generation import InferenceAPIAsyncClient`

			`client = InferenceAPIAsyncClient("bigscience/bloomz")`
			`response = await client.generate("Why is the sky blue?")`
			`print(response.generated_text)`
			`# ' Rayleigh scattering'`

			`# Token Streaming`
			`text = ""`
			`async for response in client.generate_stream("Why is the sky blue?"):`
			`if not response.token.special:`
			`text += response.token.text`

			`print(text)`
			`# ' Rayleigh scattering'`
			```
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 03:06:59 -07:00
feat(python-client): get list of currently deployed tgi models using the inference API (#191) 2023-04-17 10:43:24 -06:00			Check all currently deployed models on the Huggingface Inference API with `Text Generation` support:

			```python
			`from text_generation.inference_api import deployed_models`

			`print(deployed_models())`
			```

fix(python-client): stream not set on the sync client (#109) 2023-03-08 08:48:16 -07:00			`### Hugging Face Inference Endpoint usage`
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 03:06:59 -07:00
			```python
			`from text_generation import Client`

			`endpoint_url = "https://YOUR_ENDPOINT.endpoints.huggingface.cloud"`

			`client = Client(endpoint_url)`
			`text = client.generate("Why is the sky blue?").generated_text`
			`print(text)`
			`# ' Rayleigh scattering'`

			`# Token Streaming`
			`text = ""`
			`for response in client.generate_stream("Why is the sky blue?"):`
			`if not response.token.special:`
			`text += response.token.text`

			`print(text)`
			`# ' Rayleigh scattering'`
			```

			`or with the asynchronous client:`

			```python
			`from text_generation import AsyncClient`

			`endpoint_url = "https://YOUR_ENDPOINT.endpoints.huggingface.cloud"`

			`client = AsyncClient(endpoint_url)`
			`response = await client.generate("Why is the sky blue?")`
			`print(response.generated_text)`
			`# ' Rayleigh scattering'`

			`# Token Streaming`
			`text = ""`
			`async for response in client.generate_stream("Why is the sky blue?"):`
			`if not response.token.special:`
			`text += response.token.text`

			`print(text)`
			`# ' Rayleigh scattering'`
			```

			`### Types`

			```python
			`# Prompt tokens`
			`class PrefillToken:`
			`# Token ID from the model tokenizer`
			`id: int`
			`# Token text`
			`text: str`
			`# Logprob`
			`# Optional since the logprob of the first token cannot be computed`
			`logprob: Optional[float]`


			`# Generated tokens`
			`class Token:`
			`# Token ID from the model tokenizer`
			`id: int`
			`# Token text`
			`text: str`
			`# Logprob`
			`logprob: float`
			`# Is the token a special token`
			`# Can be used to ignore tokens when concatenating`
			`special: bool`


			`# Generation finish reason`
			`class FinishReason(Enum):`
			# number of generated tokens == `max_new_tokens`
			`Length = "length"`
			`# the model generated its end of sequence token`
			`EndOfSequenceToken = "eos_token"`
			# the model generated a text included in `stop_sequences`
			`StopSequence = "stop_sequence"`


feat(python-client): add new parameters (#118) 2023-03-09 08:05:33 -07:00			# Additional sequences when using the `best_of` parameter
			`class BestOfSequence:`
			`# Generated text`
			`generated_text: str`
			`# Generation finish reason`
			`finish_reason: FinishReason`
			`# Number of generated tokens`
			`generated_tokens: int`
			`# Sampling seed if sampling was activated`
			`seed: Optional[int]`
			`# Prompt tokens`
			`prefill: List[PrefillToken]`
			`# Generated tokens`
			`tokens: List[Token]`


feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 03:06:59 -07:00			# `generate` details
			`class Details:`
			`# Generation finish reason`
			`finish_reason: FinishReason`
			`# Number of generated tokens`
			`generated_tokens: int`
			`# Sampling seed if sampling was activated`
			`seed: Optional[int]`
			`# Prompt tokens`
			`prefill: List[PrefillToken]`
			`# Generated tokens`
			`tokens: List[Token]`
feat(python-client): add new parameters (#118) 2023-03-09 08:05:33 -07:00			# Additional sequences when using the `best_of` parameter
			`best_of_sequences: Optional[List[BestOfSequence]]`
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 03:06:59 -07:00

			# `generate` return value
			`class Response:`
			`# Generated text`
			`generated_text: str`
			`# Generation details`
			`details: Details`


			# `generate_stream` details
			`class StreamDetails:`
			`# Generation finish reason`
			`finish_reason: FinishReason`
			`# Number of generated tokens`
			`generated_tokens: int`
			`# Sampling seed if sampling was activated`
			`seed: Optional[int]`


			# `generate_stream` return value
			`class StreamResponse:`
			`# Generated token`
			`token: Token`
			`# Complete generated text`
			`# Only available when the generation is finished`
			`generated_text: Optional[str]`
			`# Generation details`
			`# Only available when the generation is finished`
			`details: Optional[StreamDetails]`
feat(python-client): get list of currently deployed tgi models using the inference API (#191) 2023-04-17 10:43:24 -06:00
			`# Inference API currently deployed model`
			`class DeployedModel:`
			`model_id: str`
			`sha: str`
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 03:06:59 -07:00			```