oai-reverse-proxy/docs/dall-e-configuration.md

3.2 KiB

Configuring the proxy for DALL-E

The proxy supports DALL-E 2 and DALL-E 3 image generation via the /proxy/openai-images endpoint. By default it is disabled as it is somewhat expensive and potentially more open to abuse than text generation.

Updating your Dockerfile

If you are using a previous version of the Dockerfile supplied with the proxy, it doesn't have the necessary permissions to let the proxy save temporary files.

You can replace the entire thing with the new Dockerfile at ./docker/huggingface/Dockerfile (or the equivalent for Render deployments).

You can also modify your existing Dockerfile; just add the following lines after the WORKDIR line:

# Existing
RUN git clone https://gitgud.io/khanon/oai-reverse-proxy.git /app
WORKDIR /app

# Take ownership of the app directory and switch to the non-root user
RUN chown -R 1000:1000 /app
USER 1000

# Existing
RUN npm install

Enabling DALL-E

Add dall-e to the ALLOWED_MODEL_FAMILIES environment variable to enable DALL-E. For example:

# GPT3.5 Turbo, GPT-4, GPT-4 Turbo, and DALL-E
ALLOWED_MODEL_FAMILIES=turbo,gpt-4,gpt-4turbo,dall-e

# All models as of this writing
ALLOWED_MODEL_FAMILIES=turbo,gpt4,gpt4-32k,gpt4-turbo,claude,bison,aws-claude,dall-e

Refer to .env.example for a full list of supported model families. You can add dall-e to that list to enable all models.

Setting quotas

DALL-E doesn't bill by token like text generation models. Instead there is a fixed cost per image generated, depending on the model, image size, and selected quality.

The proxy still uses tokens to set quotas for users. The cost for each generated image will be converted to "tokens" at a rate of 100000 tokens per US$1.00. This works out to a similar cost-per-token as GPT-4 Turbo, so you can use similar token quotas for both.

Use TOKEN_QUOTA_DALL_E to set the default quota for image generation. Otherwise it works the same as token quotas for other models.

# ~50 standard DALL-E images per refresh period, or US$2.00
TOKEN_QUOTA_DALL_E=200000

Refer to https://openai.com/pricing for the latest pricing information. As of this writing, the cheapest DALL-E 3 image costs $0.04 per generation, which works out to 4000 tokens. Higher resolution and quality settings can cost up to $0.12 per image, or 12000 tokens.

Rate limiting

The old MODEL_RATE_LIMIT setting has been split into TEXT_MODEL_RATE_LIMIT and IMAGE_MODEL_RATE_LIMIT. Whatever value you previously set for MODEL_RATE_LIMIT will be used for text models.

If you don't specify a IMAGE_MODEL_RATE_LIMIT, it defaults to half of the TEXT_MODEL_RATE_LIMIT, to a minimum of 1 image per minute.

# 4 text generations per minute, 2 images per minute
TEXT_MODEL_RATE_LIMIT=4
IMAGE_MODEL_RATE_LIMIT=2

If a prompt is filtered by OpenAI's content filter, it won't count towards the rate limit.

Hiding recent images

By default, the proxy shows the last 12 recently generated images by users. You can hide this section by setting SHOW_RECENT_IMAGES to false.