riffusion-inference/README.md

# :guitar: Riffusion

<a href="https://github.com/riffusion/riffusion/actions/workflows/ci.yml?query=branch%3Amain"><img alt="CI status" src="https://github.com/riffusion/riffusion/actions/workflows/ci.yml/badge.svg" /></a>
<img alt="Python 3.9 | 3.10" src="https://img.shields.io/badge/Python-3.9%20%7C%203.10-blue" />
<a href="https://github.com/riffusion/riffusion/tree/main/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/License-MIT-yellowgreen" /></a>

Riffusion is a library for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This is the core repository for riffusion image and audio processing code.

 * Diffusion pipeline that performs prompt interpolation combined with image conditioning
 * Conversions between spectrogram images and audio clips
 * Command-line interface for common tasks
 * Interactive app using streamlit
 * Flask server to provide model inference via API
 * Various third party integrations

Related repositories:
* Web app: https://github.com/riffusion/riffusion-app
* Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1

## Citation

If you build on this work, please cite it as follows:

```
@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}
```

## Install

Tested in CI with Python 3.9 and 3.10.

It's highly recommended to set up a virtual Python environment with `conda` or `virtualenv`:
```
conda create --name riffusion python=3.9
conda activate riffusion
```

Install Python dependencies:
```
python -m pip install -r requirements.txt
```

In order to use audio formats other than WAV, [ffmpeg](https://ffmpeg.org/download.html) is required.
```
sudo apt-get install ffmpeg          # linux
brew install ffmpeg                  # mac
conda install -c conda-forge ffmpeg  # conda
```

If torchaudio has no backend, you may need to install `libsndfile`. See [this issue](https://github.com/riffusion/riffusion/issues/12).

If you have an issue, try upgrading [diffusers](https://github.com/huggingface/diffusers). Tested with 0.9 - 0.11.

Guides:
* [Simple Install Guide for Windows](https://www.reddit.com/r/riffusion/comments/zrubc9/installation_guide_for_riffusion_app_inference/)

## Backends

### CPU
`cpu` is supported but is quite slow.

### CUDA
`cuda` is the recommended and most performant backend.

To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the
[install guide](https://pytorch.org/get-started/locally/) or
[stable wheels](https://download.pytorch.org/whl/torch_stable.html).

To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50
steps in under five seconds, such as a 3090 or A10G.

Test availability with:

```python3
import torch
torch.cuda.is_available()
```

### MPS
The `mps` backend on Apple Silicon is supported for inference but some operations fall back to CPU,
particularly for audio processing. You may need to set
`PYTORCH_ENABLE_MPS_FALLBACK=1`.

In addition, this backend is not deterministic.

Test availability with:

```python3
import torch
torch.backends.mps.is_available()
```

## Command-line interface

Riffusion comes with a command line interface for performing common tasks.

See available commands:
```
python -m riffusion.cli -h
```

Get help for a specific command:
```
python -m riffusion.cli image-to-audio -h
```

Execute:
```
python -m riffusion.cli image-to-audio --image spectrogram_image.png --audio clip.wav
```

## Riffusion Playground

Riffusion contains a [streamlit](https://streamlit.io/) app for interactive use and exploration.

Run with:
```
python -m streamlit run riffusion/streamlit/playground.py --browser.serverAddress 127.0.0.1 --browser.serverPort 8501
```

And access at http://127.0.0.1:8501/

<img alt="Riffusion Playground" style="width: 600px" src="https://i.imgur.com/OOMKBbT.png" />

## Run the model server

Riffusion can be run as a flask server that provides inference via API. This server enables the [web app](https://github.com/riffusion/riffusion-app) to run locally.

Run with:

```
python -m riffusion.server --host 127.0.0.1 --port 3013
```

You can specify `--checkpoint` with your own directory or huggingface ID in diffusers format.

Use the `--device` argument to specify the torch device to use.

The model endpoint is now available at `http://127.0.0.1:3013/run_inference` via POST request.

Example input (see [InferenceInput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L28) for the API):
```
{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}
```

Example output (see [InferenceOutput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L54) for the API):
```
{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}
```

## Tests
Tests live in the `test/` directory and are implemented with `unittest`.

To run all tests:
```
python -m unittest test/*_test.py
```

To run a single test:
```
python -m unittest test.audio_to_image_test
```

To preserve temporary outputs for debugging, set `RIFFUSION_TEST_DEBUG`:
```
RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test
```

To run a single test case within a test:
```
python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo
```

To run tests using a specific torch device, set `RIFFUSION_TEST_DEVICE`. Tests should pass with
`cpu`, `cuda`, and `mps` backends.

## Development Guide
Install additional packages for dev with `python -m pip install -r dev_requirements.txt`.

* Linter: `ruff`
* Formatter: `black`
* Type checker: `mypy`

These are configured in `pyproject.toml`.

The results of `mypy .`, `black .`, and `ruff .` *must* be clean to accept a PR.

CI is run through GitHub Actions from `.github/workflows/ci.yml`.

Contributions are welcome through pull requests.
Clean up README further 2022-12-27 09:11:50 -07:00			`# :guitar: Riffusion`
Create README.md 2022-11-25 14:20:30 -07:00
Update README.md 2022-12-27 09:30:19 -07:00			`<a href="https://github.com/riffusion/riffusion/actions/workflows/ci.yml?query=branch%3Amain"><img alt="CI status" src="https://github.com/riffusion/riffusion/actions/workflows/ci.yml/badge.svg" /></a>`
			`<img alt="Python 3.9 \| 3.10" src="https://img.shields.io/badge/Python-3.9%20%7C%203.10-blue" />`
			`<a href="https://github.com/riffusion/riffusion/tree/main/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/License-MIT-yellowgreen" /></a>`

Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			`Riffusion is a library for real-time music and audio generation with stable diffusion.`
Describe the package 2022-11-25 17:30:11 -07:00
Update README.md 2022-12-12 20:37:41 -07:00			`Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.`
readme tweak 2022-12-12 19:55:40 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`This is the core repository for riffusion image and audio processing code.`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`* Diffusion pipeline that performs prompt interpolation combined with image conditioning`
			`* Conversions between spectrogram images and audio clips`
			`* Command-line interface for common tasks`
			`* Interactive app using streamlit`
			`* Flask server to provide model inference via API`
			`* Various third party integrations`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
			`Related repositories:`
Improve installation instructions 2022-12-22 21:51:27 -07:00			`* Web app: https://github.com/riffusion/riffusion-app`
Update README.md 2022-12-12 20:24:24 -07:00			`* Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1`
Describe the package 2022-11-25 17:30:11 -07:00
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			`## Citation`
Update README.md 2022-12-12 20:37:41 -07:00
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			`If you build on this work, please cite it as follows:`
Update README.md 2022-12-12 20:37:41 -07:00
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			```
			`@article{Forsgren_Martiros_2022,`
			`author = {Forsgren, Seth* and Martiros, Hayk*},`
			`title = {{Riffusion - Stable diffusion for real-time music generation}},`
			`url = {https://riffusion.com/about},`
			`year = {2022}`
			`}`
			```
Update README.md 2022-12-12 20:37:41 -07:00
Describe the package 2022-11-25 17:30:11 -07:00			`## Install`
Improve installation instructions 2022-12-22 21:51:27 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`Tested in CI with Python 3.9 and 3.10.`
Improve installation instructions 2022-12-22 21:51:27 -07:00
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			It's highly recommended to set up a virtual Python environment with `conda` or `virtualenv`:
Describe the package 2022-11-25 17:30:11 -07:00			```
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			`conda create --name riffusion python=3.9`
			`conda activate riffusion`
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			```

			`Install Python dependencies:`
			```
Describe the package 2022-11-25 17:30:11 -07:00			`python -m pip install -r requirements.txt`
			```

Clean up README further 2022-12-27 09:11:50 -07:00			`In order to use audio formats other than WAV, [ffmpeg](https://ffmpeg.org/download.html) is required.`
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			```
Clean up README further 2022-12-27 09:11:50 -07:00			`sudo apt-get install ffmpeg # linux`
			`brew install ffmpeg # mac`
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			`conda install -c conda-forge ffmpeg # conda`
			```

Update readme for libsndfile Topic: install_libsndfile 2022-12-29 12:40:20 -07:00			If torchaudio has no backend, you may need to install `libsndfile`. See [this issue](https://github.com/riffusion/riffusion/issues/12).
Improve installation instructions 2022-12-22 21:51:27 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`If you have an issue, try upgrading [diffusers](https://github.com/huggingface/diffusers). Tested with 0.9 - 0.11.`

Improve installation instructions 2022-12-22 21:51:27 -07:00			`Guides:`
Clean up README further 2022-12-27 09:11:50 -07:00			`* [Simple Install Guide for Windows](https://www.reddit.com/r/riffusion/comments/zrubc9/installation_guide_for_riffusion_app_inference/)`
Improve installation instructions 2022-12-22 21:51:27 -07:00
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			`## Backends`

Clean up README further 2022-12-27 09:11:50 -07:00			`### CPU`
			`cpu` is supported but is quite slow.

			`### CUDA`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			`cuda` is the recommended and most performant backend.

			`To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the`
			`[install guide](https://pytorch.org/get-started/locally/) or`
Clean up README further 2022-12-27 09:11:50 -07:00			`[stable wheels](https://download.pytorch.org/whl/torch_stable.html).`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			`To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50`
			`steps in under five seconds, such as a 3090 or A10G.`

Clean up README further 2022-12-27 09:11:50 -07:00			`Test availability with:`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			```python3
			`import torch`
			`torch.cuda.is_available()`
			```
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`### MPS`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			The `mps` backend on Apple Silicon is supported for inference but some operations fall back to CPU,
			`particularly for audio processing. You may need to set`
Clean up README further 2022-12-27 09:11:50 -07:00			`PYTORCH_ENABLE_MPS_FALLBACK=1`.
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
			`In addition, this backend is not deterministic.`

Clean up README further 2022-12-27 09:11:50 -07:00			`Test availability with:`

			```python3
			`import torch`
			`torch.backends.mps.is_available()`
			```

Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			`## Command-line interface`

			`Riffusion comes with a command line interface for performing common tasks.`

			`See available commands:`
			```
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			`python -m riffusion.cli -h`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			```

			`Get help for a specific command:`
			```
			`python -m riffusion.cli image-to-audio -h`
			```

			`Execute:`
			```
			`python -m riffusion.cli image-to-audio --image spectrogram_image.png --audio clip.wav`
			```

Disable compression by default, too slow Topic: disable_compression 2022-12-27 08:44:39 -07:00			`## Riffusion Playground`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
Update README.md 2022-12-27 09:30:19 -07:00			`Riffusion contains a [streamlit](https://streamlit.io/) app for interactive use and exploration.`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
			`Run with:`
			```
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			`python -m streamlit run riffusion/streamlit/playground.py --browser.serverAddress 127.0.0.1 --browser.serverPort 8501`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			```

			`And access at http://127.0.0.1:8501/`

Update README.md 2022-12-27 09:30:19 -07:00			`<img alt="Riffusion Playground" style="width: 600px" src="https://i.imgur.com/OOMKBbT.png" />`

Update python project configuration files * Adds a pyproject.toml * Update requirements and dev requirements * Add a CITATION file * Add details to the README Topic: clean_rewrite 2022-12-26 18:26:46 -07:00			`## Run the model server`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`Riffusion can be run as a flask server that provides inference via API. This server enables the [web app](https://github.com/riffusion/riffusion-app) to run locally.`

			`Run with:`
Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00
Describe the package 2022-11-25 17:30:11 -07:00			```
Update python project configuration files * Adds a pyproject.toml * Update requirements and dev requirements * Add a CITATION file * Add details to the README Topic: clean_rewrite 2022-12-26 18:26:46 -07:00			`python -m riffusion.server --host 127.0.0.1 --port 3013`
Describe the package 2022-11-25 17:30:11 -07:00			```

make server work 2022-12-12 23:43:46 -07:00			You can specify `--checkpoint` with your own directory or huggingface ID in diffusers format.

Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			Use the `--device` argument to specify the torch device to use.

Describe the package 2022-11-25 17:30:11 -07:00			The model endpoint is now available at `http://127.0.0.1:3013/run_inference` via POST request.

			`Example input (see [InferenceInput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L28) for the API):`
			```
			`{`
Make the example input and output valid JSON strings 2022-12-15 12:28:21 -07:00			`"alpha": 0.75,`
			`"num_inference_steps": 50,`
			`"seed_image_id": "og_beat",`

			`"start": {`
			`"prompt": "church bells on sunday",`
			`"seed": 42,`
			`"denoising": 0.75,`
			`"guidance": 7.0`
Describe the package 2022-11-25 17:30:11 -07:00			`},`

Make the example input and output valid JSON strings 2022-12-15 12:28:21 -07:00			`"end": {`
			`"prompt": "jazz with piano",`
			`"seed": 123,`
			`"denoising": 0.75,`
			`"guidance": 7.0`
			`}`
Describe the package 2022-11-25 17:30:11 -07:00			`}`
			```

			`Example output (see [InferenceOutput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L54) for the API):`
			```
			`{`
Make the example input and output valid JSON strings 2022-12-15 12:28:21 -07:00			`"image": "< base64 encoded JPEG image >",`
			`"audio": "< base64 encoded MP3 clip >"`
Describe the package 2022-11-25 17:30:11 -07:00			`}`
			```
Update README.md 2022-12-14 22:15:02 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`## Tests`
Update python project configuration files * Adds a pyproject.toml * Update requirements and dev requirements * Add a CITATION file * Add details to the README Topic: clean_rewrite 2022-12-26 18:26:46 -07:00			Tests live in the `test/` directory and are implemented with `unittest`.

			`To run all tests:`
			```
			`python -m unittest test/*_test.py`
			```

			`To run a single test:`
			```
			`python -m unittest test.audio_to_image_test`
			```

			To preserve temporary outputs for debugging, set `RIFFUSION_TEST_DEBUG`:
			```
			`RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test`
			```

Wording Topic: streamlit_app 2022-12-27 01:32:10 -07:00			`To run a single test case within a test:`
Update python project configuration files * Adds a pyproject.toml * Update requirements and dev requirements * Add a CITATION file * Add details to the README Topic: clean_rewrite 2022-12-26 18:26:46 -07:00			```
			`python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo`
			```

			To run tests using a specific torch device, set `RIFFUSION_TEST_DEVICE`. Tests should pass with
			`cpu`, `cuda`, and `mps` backends.

Clean up README further 2022-12-27 09:11:50 -07:00			`## Development Guide`
Add detail to readme Topic: readme_3 2022-12-27 08:22:34 -07:00			Install additional packages for dev with `python -m pip install -r dev_requirements.txt`.
Update python project configuration files * Adds a pyproject.toml * Update requirements and dev requirements * Add a CITATION file * Add details to the README Topic: clean_rewrite 2022-12-26 18:26:46 -07:00
			* Linter: `ruff`
			* Formatter: `black`
			* Type checker: `mypy`

			These are configured in `pyproject.toml`.

			The results of `mypy .`, `black .`, and `ruff .` must be clean to accept a PR.

Add batch text to audio processing Topic: streamlit_app 2022-12-26 22:32:42 -07:00			CI is run through GitHub Actions from `.github/workflows/ci.yml`.
Update README.md 2022-12-14 22:15:02 -07:00
Clean up README further 2022-12-27 09:11:50 -07:00			`Contributions are welcome through pull requests.`