stable-diffusion-webui/README.md

# Stable Diffusion web UI
A browser interface based on Gradio library for Stable Diffusion.

Original script with Gradio UI was written by a kind anonymous user. This is a modification.

![](screenshot.png)
## Installing and running

### Stable Diffusion

This script assumes that you already have main Stable Diffusion sutff installed, assumed to be in directory `/sd`.
If you don't have it installed, follow the guide:

- https://rentry.org/kretard

This repository's `webgui.py` is a replacement for `kdiff.py` from the guide.

Particularly, following files must exist:

- `/sd/configs/stable-diffusion/v1-inference.yaml`
- `/sd/models/ldm/stable-diffusion-v1/model.ckpt`
- `/sd/ldm/util.py`
- `/sd/k_diffusion/__init__.py`

### GFPGAN

If you want to use GFPGAN to improve generated faces, you need to install it separately.
Follow instructions from https://github.com/TencentARC/GFPGAN, but when cloning it, do so into Stable Diffusion main directory, `/sd`.
After that download [GFPGANv1.3.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth) and put it
into the `/sd/GFPGAN/experiments/pretrained_models` directory. If you're getting troubles with GFPGAN support, follow instructions
from the GFPGAN's repository until `inference_gfpgan.py` script works.

The following files must exist:

- `/sd/GFPGAN/inference_gfpgan.py`
- `/sd/GFPGAN/experiments/pretrained_models/GFPGANv1.3.pth`

If the GFPGAN directory does not exist, you will not get the option to use GFPGAN in the UI. If it does exist, you will either be able
to use it, or there will be a message in console with an error related to GFPGAN.

### Web UI

Run the script as:

`python webui.py`

When running the script, you must be in the main Stable Diffusion directory, `/sd`. If you cloned this repository into a subdirectory 
of `/sd`, say, the `stable-diffusion-webui` directory, you will run it as:

`python stable-diffusion-webui/webui.py`

When launching, you may get a very long warning message related to some weights not being used. You may freely ignore it.
After a while, you will get a message like this:

```
Running on local URL:  http://127.0.0.1:7860/
```

Open the URL in browser, and you are good to go.

## Features
The script creates a web UI for Stable Diffusion's txt2img and img2img scripts. Following are features added
that are not in original script.

### Extras tab
Additional neural network image improvement methods unrelated to stable diffusion.

#### GFPGAN
Lets you improve faces in pictures using the GFPGAN model. There is a checkbox in every tab to use GFPGAN at 100%, and
also a separate tab that just allows you to use GFPGAN on any picture, with a slider that controls how strongthe effect is.

![](images/GFPGAN.png)

#### Real-ESRGAN
Image upscaler. You can choose from multiple models by original author, and specify by how much the image should be upscaled.
Requires `realesrgan` librarty:

```commandline
pip install realesrgan
```

### Sampling method selection
Pick out of multiple sampling methods for txt2img:

![](images/sampling.png)

### Prompt matrix
Separate multiple prompts using the `|` character, and the system will produce an image for every combination of them.
For example, if you use `a busy city street in a modern city|illustration|cinematic lighting` prompt, there are four combinations possible (first part of prompt is always kept):

- `a busy city street in a modern city`
- `a busy city street in a modern city, illustration`
- `a busy city street in a modern city, cinematic lighting`
- `a busy city street in a modern city, illustration, cinematic lighting`

Four images will be produced, in this order, all with same seed and each with corresponding prompt:
![](images/prompt-matrix.png)

Another example, this time with 5 prompts and 16 variations:
![](images/prompt_matrix.jpg)

If you use this feature, batch count will be ignored, because the number of pictures to produce
depends on your prompts, but batch size will still work (generating multiple pictures at the
same time for a small speed boost).

### Flagging
Click the Flag button under the output section, and generated images will be saved to `log/images` directory, and generation parameters
will be appended to a csv file `log/log.csv` in the `/sd` directory.

> but every image is saved, why would I need this?

If you're like me, you experiment a lot with prompts and settings, and only few images are worth saving. You can
just save them using right click in browser, but then you won't be able to reproduce them later because you will not
know what exact prompt created the image. If you use the flag button, generation parameters will be written to csv file,
and you can easily find parameters for an image by searching for its filename.

### Copy-paste generation parameters
A text output provides generation parameters in an easy to copy-paste form for easy sharing.

![](images/kopipe.png)

If you generate multiple pictures, the displayed seed will be the seed of the first one.

### Correct seeds for batches
If you use a seed of 1000 to generate two batches of two images each, four generated images will have seeds: `1000, 1001, 1002, 1003`.
Previous versions of the UI would produce `1000, x, 1001, x`, where x is an image that can't be generated by any seed.

### Resizing
There are three options for resizing input images in img2img mode:

- Just resize - simply resizes source image to target resolution, resulting in incorrect aspect ratio
- Crop and resize - resize source image preserving aspect ratio so that entirety of target resolution is occupied by it, and crop parts that stick out
- Resize and fill - resize source image preserving aspect ratio so that it entirely fits target resolution, and fill empty space by rows/columns from source image

Example:
![](images/resizing.jpg)

### Loading
Gradio's loading graphic has a very negative effect on the processing speed of the neural network. 
My RTX 3090 makes images about 10% faster when the tab with gradio is not active. By default, the UI
now hides loading progress animation and replaces it with static "Loading..." text, which achieves
the same effect. Use the `--no-progressbar-hiding` commandline option to revert this and show loading animations.

### Prompt validation
Stable Diffusion has a limit for input text length. If your prompt is too long, you will get a
warning in the text output field, showing which parts of your text were truncated and ignored by the model.

### Loopback
A checkbox for img2img allowing to automatically feed output image as input for the next batch. Equivalent to
saving output image, and replacing input image with it. Batch count setting controls how many iterations of
this you get.

Usually, when doing this, you would choose one of many images for the next iteration yourself, so the usefulness
of this feature may be questionable, but I've managed to get some very nice outputs with it that I wasn't abble
to get otherwise.

Example: (cherrypicked result; original picture by anon)

![](images/loopback.jpg)

### Png info
Adds information about generation parameters to PNG as a text chunk. You
can view this information later using any software that supports viewing
PNG chunk info, for example: https://www.nayuki.io/page/png-file-chunk-inspector

![](images/pnginfo.png)

### Textual Inversion
Allows you to use pretrained textual inversion embeddings.
See original site for details: https://textual-inversion.github.io/.
I used lstein's repo for training embdedding: https://github.com/lstein/stable-diffusion; if
you want to train your own, I recommend following the guide on his site.

No additional libraries/repositories are required to use pretrained embeddings.

To make use of pretrained embeddings, create `embeddings` directory in the root dir of Stable
Diffusion and put your embeddings into it. They must be .pt files about 5Kb in size, each with only
one trained embedding, and the filename (without .pt) will be the term you'd use in prompt
to get that embedding.

As an example, I trained one for about 5000 steps: https://files.catbox.moe/e2ui6r.pt; it does
not produce very good results, but it does work. Download and rename it to `Usada Pekora.pt`,
and put it into `embeddings` dir and use Usada Pekora in prompt.

![](images/inversion.png)

### Settings
A tab with settings, allowing you to use UI to edit more than half of parameters that previously
were commandline. Settings are saved to config.js file. Settings that remain as commandline
options are ones that are required at startup.

### Attention
Using `()` in prompt decreases model's attention to enclosed words, and `[]` increases it. You can combine
multiple modifiers:

![](images/attention-3.jpg)

### SD upscale
Upscale image using RealESRGAN and then go through tiles of the result, improving them with img2img.

Original idea by: https://github.com/jquesnelle/txt2imghd. This is an independent implementation.

To use this feature, tick a checkbox in the img2img interface. Original
image will be upscaled to twice the original width and height, while width and height sliders
will specify the size of individual tiles. At the moment this method does not support batch size.

![](images/sd-upscale.jpg)

### User scripts
If the program is launched with `--allow-code` option, an extra text input field for script code
is available in txt2img interface. It allows you to input python code that will do the work with
image. If this field is not empty, the processing that would happen normally is skipped.

In code, access parameters from web UI using the `p` variable, and provide outputs for web UI
using the `display(images, seed, info)` function. All globals from script are also accessible.

As an example, here is a script that draws a chart seen below (and also saves it as `test/gnomeplot/gnome5.png`):

```python
steps = [4, 8,12,16, 20]
cfg_scales = [5.0,10.0,15.0]

def cell(x, y, p=p):
	p.steps = x
	p.cfg_scale = y
	return process_images(p).images[0]

images = [draw_xy_grid(
	xs = steps,
	ys = cfg_scales,
	x_label = lambda x: f'Steps = {x}',
	y_label = lambda y: f'CFG = {y}',
	cell = cell
)]

save_image(images[0], 'test/gnomeplot', 'gnome5')
display(images)
```

![](images/scripting.jpg)

A more simple script that would just process the image and output it normally:

```python
processed = process_images(p)

print("Seed was: " + str(processed.seed))

display(processed.images, processed.seed, processed.info)
```

### `--lowvram`
Optimizations for GPUs with low VRAM. This should make it possible to generate 512x512 images on videocards with 4GB memory.

The original idea of those ideas is by basujindal: https://github.com/basujindal/stable-diffusion. Model is separated into modules,
and only one module is kept in GPU memory; when another module needs to run, the previous is removed from GPU memory.

It should be obvious but the nature of those optimizations makes the processing run slower -- about 10 times slower
compared to normal operation on my RTX 3090.

This is an independent implementation that does not require any modification to original Stable Diffusion code, and
with all code concenrated in one place rather than scattered around the program.
first 2022-08-22 08:15:46 -06:00			`# Stable Diffusion web UI`
			`A browser interface based on Gradio library for Stable Diffusion.`

textual inversion embeddings support settings tab 2022-08-25 12:52:05 -06:00			`Original script with Gradio UI was written by a kind anonymous user. This is a modification.`
first 2022-08-22 08:15:46 -06:00
			`![](screenshot.png)`
silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`## Installing and running`
first 2022-08-22 08:15:46 -06:00
silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`### Stable Diffusion`
first 2022-08-22 08:15:46 -06:00
			This script assumes that you already have main Stable Diffusion sutff installed, assumed to be in directory `/sd`.
			`If you don't have it installed, follow the guide:`

			`- https://rentry.org/kretard`

			This repository's `webgui.py` is a replacement for `kdiff.py` from the guide.

			`Particularly, following files must exist:`

			- `/sd/configs/stable-diffusion/v1-inference.yaml`
			- `/sd/models/ldm/stable-diffusion-v1/model.ckpt`
			- `/sd/ldm/util.py`
			- `/sd/k_diffusion/__init__.py`

silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`### GFPGAN`
first 2022-08-22 08:15:46 -06:00
			`If you want to use GFPGAN to improve generated faces, you need to install it separately.`
			Follow instructions from https://github.com/TencentARC/GFPGAN, but when cloning it, do so into Stable Diffusion main directory, `/sd`.
			`After that download [GFPGANv1.3.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth) and put it`
			into the `/sd/GFPGAN/experiments/pretrained_models` directory. If you're getting troubles with GFPGAN support, follow instructions
			from the GFPGAN's repository until `inference_gfpgan.py` script works.

			`The following files must exist:`

			- `/sd/GFPGAN/inference_gfpgan.py`
			- `/sd/GFPGAN/experiments/pretrained_models/GFPGANv1.3.pth`

			`If the GFPGAN directory does not exist, you will not get the option to use GFPGAN in the UI. If it does exist, you will either be able`
			`to use it, or there will be a message in console with an error related to GFPGAN.`

silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`### Web UI`
first 2022-08-22 08:15:46 -06:00
			`Run the script as:`

			`python webui.py`

			When running the script, you must be in the main Stable Diffusion directory, `/sd`. If you cloned this repository into a subdirectory
			of `/sd`, say, the `stable-diffusion-webui` directory, you will run it as:

			`python stable-diffusion-webui/webui.py`

			`When launching, you may get a very long warning message related to some weights not being used. You may freely ignore it.`
			`After a while, you will get a message like this:`

			```
			`Running on local URL: http://127.0.0.1:7860/`
			```

			`Open the URL in browser, and you are good to go.`
silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00
			`## Features`
			`The script creates a web UI for Stable Diffusion's txt2img and img2img scripts. Following are features added`
			`that are not in original script.`

Renamed GFPGAN to extras Added Real-ESRGAN to extras tab 2022-08-26 02:16:57 -06:00			`### Extras tab`
			`Additional neural network image improvement methods unrelated to stable diffusion.`

			`#### GFPGAN`
silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`Lets you improve faces in pictures using the GFPGAN model. There is a checkbox in every tab to use GFPGAN at 100%, and`
			`also a separate tab that just allows you to use GFPGAN on any picture, with a slider that controls how strongthe effect is.`

			`![](images/GFPGAN.png)`

Renamed GFPGAN to extras Added Real-ESRGAN to extras tab 2022-08-26 02:16:57 -06:00			`#### Real-ESRGAN`
			`Image upscaler. You can choose from multiple models by original author, and specify by how much the image should be upscaled.`
			Requires `realesrgan` librarty:

			```commandline
			`pip install realesrgan`
			```

silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`### Sampling method selection`
Renamed GFPGAN to extras Added Real-ESRGAN to extras tab 2022-08-26 02:16:57 -06:00			`Pick out of multiple sampling methods for txt2img:`
silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00
			`![](images/sampling.png)`

			`### Prompt matrix`
			Separate multiple prompts using the `\|` character, and the system will produce an image for every combination of them.
Prompt matrix now draws text like in demo. 2022-08-23 09:04:13 -06:00			For example, if you use `a busy city street in a modern city\|illustration\|cinematic lighting` prompt, there are four combinations possible (first part of prompt is always kept):
silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00
Prompt matrix now draws text like in demo. 2022-08-23 09:04:13 -06:00			- `a busy city street in a modern city`
			- `a busy city street in a modern city, illustration`
			- `a busy city street in a modern city, cinematic lighting`
			- `a busy city street in a modern city, illustration, cinematic lighting`
silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00
			`Four images will be produced, in this order, all with same seed and each with corresponding prompt:`
			`![](images/prompt-matrix.png)`

Prompt matrix now draws text like in demo. 2022-08-23 09:04:13 -06:00			`Another example, this time with 5 prompts and 16 variations:`
additional picture for prompt matrix proper seeds for img2img a bit of refactoring 2022-08-23 05:07:37 -06:00			`![](images/prompt_matrix.jpg)`

readme extra 2022-08-23 13:49:58 -06:00			`If you use this feature, batch count will be ignored, because the number of pictures to produce`
			`depends on your prompts, but batch size will still work (generating multiple pictures at the`
			`same time for a small speed boost).`

silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`### Flagging`
			Click the Flag button under the output section, and generated images will be saved to `log/images` directory, and generation parameters
			will be appended to a csv file `log/log.csv` in the `/sd` directory.

gfpgan dir for the guide's directory names fix a bug in image resizing 2022-08-24 04:42:21 -06:00			`> but every image is saved, why would I need this?`

			`If you're like me, you experiment a lot with prompts and settings, and only few images are worth saving. You can`
			`just save them using right click in browser, but then you won't be able to reproduce them later because you will not`
README.md: Fix typos 2022-08-29 00:05:13 -06:00			`know what exact prompt created the image. If you use the flag button, generation parameters will be written to csv file,`
gfpgan dir for the guide's directory names fix a bug in image resizing 2022-08-24 04:42:21 -06:00			`and you can easily find parameters for an image by searching for its filename.`

silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`### Copy-paste generation parameters`
			`A text output provides generation parameters in an easy to copy-paste form for easy sharing.`

			`![](images/kopipe.png)`

readme extra 2022-08-23 13:49:58 -06:00			`If you generate multiple pictures, the displayed seed will be the seed of the first one.`

silence the warning from transformers add feature demonstrations to readme 2022-08-23 02:58:50 -06:00			`### Correct seeds for batches`
			If you use a seed of 1000 to generate two batches of two images each, four generated images will have seeds: `1000, 1001, 1002, 1003`.
README.md: Fix typos 2022-08-29 00:05:13 -06:00			Previous versions of the UI would produce `1000, x, 1001, x`, where x is an image that can't be generated by any seed.
added resizing modes added more info into readme 2022-08-24 01:52:41 -06:00
			`### Resizing`
			`There are three options for resizing input images in img2img mode:`

			`- Just resize - simply resizes source image to target resolution, resulting in incorrect aspect ratio`
			`- Crop and resize - resize source image preserving aspect ratio so that entirety of target resolution is occupied by it, and crop parts that stick out`
			`- Resize and fill - resize source image preserving aspect ratio so that it entirely fits target resolution, and fill empty space by rows/columns from source image`

			`Example:`
			`![](images/resizing.jpg)`

			`### Loading`
typos 2022-08-24 01:59:47 -06:00			`Gradio's loading graphic has a very negative effect on the processing speed of the neural network.`
			`My RTX 3090 makes images about 10% faster when the tab with gradio is not active. By default, the UI`
			`now hides loading progress animation and replaces it with static "Loading..." text, which achieves`
textual inversion embeddings support settings tab 2022-08-25 12:52:05 -06:00			the same effect. Use the `--no-progressbar-hiding` commandline option to revert this and show loading animations.
added resizing modes added more info into readme 2022-08-24 01:52:41 -06:00
			`### Prompt validation`
typos 2022-08-24 01:59:47 -06:00			`Stable Diffusion has a limit for input text length. If your prompt is too long, you will get a`
			`warning in the text output field, showing which parts of your text were truncated and ignored by the model.`
readme for loopback 2022-08-24 07:43:05 -06:00
			`### Loopback`
			`A checkbox for img2img allowing to automatically feed output image as input for the next batch. Equivalent to`
			`saving output image, and replacing input image with it. Batch count setting controls how many iterations of`
			`this you get.`

			`Usually, when doing this, you would choose one of many images for the next iteration yourself, so the usefulness`
			`of this feature may be questionable, but I've managed to get some very nice outputs with it that I wasn't abble`
			`to get otherwise.`

			`Example: (cherrypicked result; original picture by anon)`

			`![](images/loopback.jpg)`
png chunk info for readme 2022-08-24 10:05:03 -06:00
			`### Png info`
			`Adds information about generation parameters to PNG as a text chunk. You`
			`can view this information later using any software that supports viewing`
			`PNG chunk info, for example: https://www.nayuki.io/page/png-file-chunk-inspector`

			`![](images/pnginfo.png)`
textual inversion embeddings support settings tab 2022-08-25 12:52:05 -06:00
			`### Textual Inversion`
			`Allows you to use pretrained textual inversion embeddings.`
README.md: Fix typos 2022-08-29 00:05:13 -06:00			`See original site for details: https://textual-inversion.github.io/.`
textual inversion embeddings support settings tab 2022-08-25 12:52:05 -06:00			`I used lstein's repo for training embdedding: https://github.com/lstein/stable-diffusion; if`
			`you want to train your own, I recommend following the guide on his site.`

			`No additional libraries/repositories are required to use pretrained embeddings.`

			To make use of pretrained embeddings, create `embeddings` directory in the root dir of Stable
			`Diffusion and put your embeddings into it. They must be .pt files about 5Kb in size, each with only`
			`one trained embedding, and the filename (without .pt) will be the term you'd use in prompt`
			`to get that embedding.`

			`As an example, I trained one for about 5000 steps: https://files.catbox.moe/e2ui6r.pt; it does`
			not produce very good results, but it does work. Download and rename it to `Usada Pekora.pt`,
			and put it into `embeddings` dir and use Usada Pekora in prompt.

			`![](images/inversion.png)`

			`### Settings`
			`A tab with settings, allowing you to use UI to edit more than half of parameters that previously`
			`were commandline. Settings are saved to config.js file. Settings that remain as commandline`
			`options are ones that are required at startup.`
implementation for attention using [] and () 2022-08-27 02:17:55 -06:00
			`### Attention`
			Using `()` in prompt decreases model's attention to enclosed words, and `[]` increases it. You can combine
			`multiple modifiers:`

			`![](images/attention-3.jpg)`
Implementation for SD upscale. 2022-08-27 07:13:33 -06:00
			`### SD upscale`
			`Upscale image using RealESRGAN and then go through tiles of the result, improving them with img2img.`

			`Original idea by: https://github.com/jquesnelle/txt2imghd. This is an independent implementation.`

			`To use this feature, tick a checkbox in the img2img interface. Original`
			`image will be upscaled to twice the original width and height, while width and height sliders`
			`will specify the size of individual tiles. At the moment this method does not support batch size.`

			`![](images/sd-upscale.jpg)`
support for running custom code (primarily to generate various labeled grids) export for 4chan option 2022-08-28 07:38:59 -06:00
			`### User scripts`
			If the program is launched with `--allow-code` option, an extra text input field for script code
			`is available in txt2img interface. It allows you to input python code that will do the work with`
			`image. If this field is not empty, the processing that would happen normally is skipped.`

			In code, access parameters from web UI using the `p` variable, and provide outputs for web UI
			using the `display(images, seed, info)` function. All globals from script are also accessible.

			As an example, here is a script that draws a chart seen below (and also saves it as `test/gnomeplot/gnome5.png`):

			```python
			`steps = [4, 8,12,16, 20]`
			`cfg_scales = [5.0,10.0,15.0]`

			`def cell(x, y, p=p):`
			`p.steps = x`
			`p.cfg_scale = y`
			`return process_images(p).images[0]`

			`images = [draw_xy_grid(`
			`xs = steps,`
			`ys = cfg_scales,`
			`x_label = lambda x: f'Steps = {x}',`
			`y_label = lambda y: f'CFG = {y}',`
			`cell = cell`
			`)]`

			`save_image(images[0], 'test/gnomeplot', 'gnome5')`
			`display(images)`
			```

			`![](images/scripting.jpg)`

			`A more simple script that would just process the image and output it normally:`

			```python
			`processed = process_images(p)`

			`print("Seed was: " + str(processed.seed))`

			`display(processed.images, processed.seed, processed.info)`
			```
readme for --lowvram 2022-08-29 01:23:57 -06:00
			### `--lowvram`
			`Optimizations for GPUs with low VRAM. This should make it possible to generate 512x512 images on videocards with 4GB memory.`

			`The original idea of those ideas is by basujindal: https://github.com/basujindal/stable-diffusion. Model is separated into modules,`
			`and only one module is kept in GPU memory; when another module needs to run, the previous is removed from GPU memory.`

			`It should be obvious but the nature of those optimizations makes the processing run slower -- about 10 times slower`
			`compared to normal operation on my RTX 3090.`

			`This is an independent implementation that does not require any modification to original Stable Diffusion code, and`
			`with all code concenrated in one place rather than scattered around the program.`