riffusion-inference/README.md

# Riffusion

Riffusion is a technique for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

* Inference server: https://github.com/riffusion/riffusion
* Web app: https://github.com/riffusion/riffusion-app
* Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1

This repository contains the Python backend does the model inference and audio processing, including:

 * a diffusers pipeline that performs prompt interpolation combined with image conditioning
 * a module for (approximately) converting between spectrograms and waveforms
 * a flask server to provide model inference via API to the next.js app
 * a model template titled baseten.py for deploying as a Truss


## Install

Tested with Python 3.9 and diffusers 0.9.0.

To run this model, you need a GPU with CUDA. To run it in real time, it needs to be able to run stable diffusion with approximately 50 steps in under five seconds.

You need to make sure you have torch and torchaudio installed with CUDA support. See the [install guide](https://pytorch.org/get-started/locally/) or [stable wheels](https://download.pytorch.org/whl/torch_stable.html).

```
conda create --name riffusion-inference python=3.9
conda activate riffusion-inference
python -m pip install -r requirements.txt
```

If torchaudio has no audio backend, see [this issue](https://github.com/riffusion/riffusion/issues/12).

You can open and save WAV files with pure python. For opening and saving non-wav files – like mp3 – you'll need ffmpeg or libav.

Guides:
* [CUDA help](https://github.com/riffusion/riffusion/issues/3)
* [Windows Simple Instructions](https://www.reddit.com/r/riffusion/comments/zrubc9/installation_guide_for_riffusion_app_inference/)

## Run the model server
Start the Flask server:
```
python -m riffusion.server --host 127.0.0.1 --port 3013
```

You can specify `--checkpoint` with your own directory or huggingface ID in diffusers format.

The model endpoint is now available at `http://127.0.0.1:3013/run_inference` via POST request.

Example input (see [InferenceInput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L28) for the API):
```
{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}
```

Example output (see [InferenceOutput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L54) for the API):
```
{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}
```

Use the `--device` argument to specify the torch device to use.

`cuda` is recommended.

`cpu` works but is quite slow.

`mps` is supported for inference, but some operations fall back to CPU. You may need to set
PYTORCH_ENABLE_MPS_FALLBACK=1. In addition, it is not deterministic.

## Test
Tests live in the `test/` directory and are implemented with `unittest`.

To run all tests:
```
python -m unittest test/*_test.py
```

To run a single test:
```
python -m unittest test.audio_to_image_test
```

To preserve temporary outputs for debugging, set `RIFFUSION_TEST_DEBUG`:
```
RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test
```

To run a single test case:
```
python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo
```

To run tests using a specific torch device, set `RIFFUSION_TEST_DEVICE`. Tests should pass with
`cpu`, `cuda`, and `mps` backends.

## Development
Install additional packages for dev with `pip install -r dev_requirements.txt`.

* Linter: `ruff`
* Formatter: `black`
* Type checker: `mypy`

These are configured in `pyproject.toml`.

The results of `mypy .`, `black .`, and `ruff .` *must* be clean to accept a PR.

## Citation

If you build on this work, please cite it as follows:

```
@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}
```
-												Improve installation instructions
											
										
										
											2022-12-22 21:51:27 -07:00
+								# Riffusion
-												Create README.md
											
										
										
											2022-11-25 14:20:30 -07:00
-												Improve installation instructions
											
										
										
											2022-12-22 21:51:27 -07:00
+								Riffusion is a technique for real-time music and audio generation with stable diffusion.
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
-												Update README.md
											
										
										
											2022-12-12 20:37:41 -07:00
+								Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.
-												readme tweak

											
										
										
											2022-12-12 19:55:40 -07:00
-												Improve installation instructions
											
										
										
											2022-12-22 21:51:27 -07:00
+								* Inference server: https://github.com/riffusion/riffusion
 								* Web app: https://github.com/riffusion/riffusion-app
-												Update README.md
											
										
										
											2022-12-12 20:24:24 -07:00
+								* Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
-												Update README.md
											
										
										
											2022-12-12 20:40:21 -07:00
+								This repository contains the Python backend does the model inference and audio processing, including:
-												Update README.md
											
										
										
											2022-12-12 20:37:41 -07:00
 								 * a diffusers pipeline that performs prompt interpolation combined with image conditioning
 								 * a module for (approximately) converting between spectrograms and waveforms
 								 * a flask server to provide model inference via API to the next.js app
-												Update README.md
											
										
										
											2022-12-12 23:06:14 -07:00
+								 * a model template titled baseten.py for deploying as a Truss
-												Update README.md
											
										
										
											2022-12-12 20:37:41 -07:00
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
+								## Install
-												Improve installation instructions
											
										
										
											2022-12-22 21:51:27 -07:00
 								Tested with Python 3.9 and diffusers 0.9.0.
 								To run this model, you need a GPU with CUDA. To run it in real time, it needs to be able to run stable diffusion with approximately 50 steps in under five seconds.
-												Update python project configuration files

* Adds a pyproject.toml
* Update requirements and dev requirements
* Add a CITATION file
* Add details to the README

Topic: clean_rewrite

											
										
										
											2022-12-26 18:26:46 -07:00
+								You need to make sure you have torch and torchaudio installed with CUDA support. See the [install guide](https://pytorch.org/get-started/locally/) or [stable wheels](https://download.pytorch.org/whl/torch_stable.html).
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
 								```
 								conda create --name riffusion-inference python=3.9
 								conda activate riffusion-inference
 								python -m pip install -r requirements.txt
 								```
-												Improve installation instructions
											
										
										
											2022-12-22 21:51:27 -07:00
+								If torchaudio has no audio backend, see [this issue](https://github.com/riffusion/riffusion/issues/12).
-												Update python project configuration files

* Adds a pyproject.toml
* Update requirements and dev requirements
* Add a CITATION file
* Add details to the README

Topic: clean_rewrite

											
										
										
											2022-12-26 18:26:46 -07:00
+								You can open and save WAV files with pure python. For opening and saving non-wav files – like mp3 – you'll need ffmpeg or libav.
-												Improve installation instructions
											
										
										
											2022-12-22 21:51:27 -07:00
+								Guides:
 								* [CUDA help](https://github.com/riffusion/riffusion/issues/3)
 								* [Windows Simple Instructions](https://www.reddit.com/r/riffusion/comments/zrubc9/installation_guide_for_riffusion_app_inference/)
-												Update python project configuration files

* Adds a pyproject.toml
* Update requirements and dev requirements
* Add a CITATION file
* Add details to the README

Topic: clean_rewrite

											
										
										
											2022-12-26 18:26:46 -07:00
+								## Run the model server
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
+								Start the Flask server:
 								```
-												Update python project configuration files

* Adds a pyproject.toml
* Update requirements and dev requirements
* Add a CITATION file
* Add details to the README

Topic: clean_rewrite

											
										
										
											2022-12-26 18:26:46 -07:00
+								python -m riffusion.server --host 127.0.0.1 --port 3013
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
+								```
-												make server work

											
										
										
											2022-12-12 23:43:46 -07:00
+								You can specify `--checkpoint` with your own directory or huggingface ID in diffusers format.
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
+								The model endpoint is now available at `http://127.0.0.1:3013/run_inference` via POST request.
 								Example input (see [InferenceInput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L28) for the API):
 								```
 								{
-												Make the example input and output valid JSON strings
											
										
										
											2022-12-15 12:28:21 -07:00
+								  "alpha": 0.75,
 								  "num_inference_steps": 50,
 								  "seed_image_id": "og_beat",
 								  "start": {
 								    "prompt": "church bells on sunday",
 								    "seed": 42,
 								    "denoising": 0.75,
 								    "guidance": 7.0
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
+								  },
-												Make the example input and output valid JSON strings
											
										
										
											2022-12-15 12:28:21 -07:00
+								  "end": {
 								    "prompt": "jazz with piano",
 								    "seed": 123,
 								    "denoising": 0.75,
 								    "guidance": 7.0
 								  }
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
+								}
 								```
 								Example output (see [InferenceOutput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L54) for the API):
 								```
 								{
-												Make the example input and output valid JSON strings
											
										
										
											2022-12-15 12:28:21 -07:00
+								  "image": "< base64 encoded JPEG image >",
 								  "audio": "< base64 encoded MP3 clip >"
-												Describe the package
											
										
										
											2022-11-25 17:30:11 -07:00
+								}
 								```
-												Update README.md
											
										
										
											2022-12-14 22:15:02 -07:00
-												Update python project configuration files

* Adds a pyproject.toml
* Update requirements and dev requirements
* Add a CITATION file
* Add details to the README

Topic: clean_rewrite

											
										
										
											2022-12-26 18:26:46 -07:00
+								Use the `--device` argument to specify the torch device to use.
 								`cuda` is recommended.
 								`cpu` works but is quite slow.
 								`mps` is supported for inference, but some operations fall back to CPU. You may need to set
 								PYTORCH_ENABLE_MPS_FALLBACK=1. In addition, it is not deterministic.
 								## Test
 								Tests live in the `test/` directory and are implemented with `unittest`.
 								To run all tests:
 								```
 								python -m unittest test/*_test.py
 								```
 								To run a single test:
 								```
 								python -m unittest test.audio_to_image_test
 								```
 								To preserve temporary outputs for debugging, set `RIFFUSION_TEST_DEBUG`:
 								```
 								RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test
 								```
 								To run a single test case:
 								```
 								python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo
 								```
 								To run tests using a specific torch device, set `RIFFUSION_TEST_DEVICE`. Tests should pass with
 								`cpu`, `cuda`, and `mps` backends.
 								## Development
 								Install additional packages for dev with `pip install -r dev_requirements.txt`.
 								* Linter: `ruff`
 								* Formatter: `black`
 								* Type checker: `mypy`
 								These are configured in `pyproject.toml`.
 								The results of `mypy .`, `black .`, and `ruff .` *must* be clean to accept a PR.
-												Update README.md
											
										
										
											2022-12-14 22:15:02 -07:00
+								## Citation
 								If you build on this work, please cite it as follows:
 								```
-												Change citation to article
											
										
										
											2022-12-19 00:13:08 -07:00
+								@article{Forsgren_Martiros_2022,
-												Update README.md
											
										
										
											2022-12-14 22:15:02 -07:00
+								  author = {Forsgren, Seth* and Martiros, Hayk*},
 								  title = {{Riffusion - Stable diffusion for real-time music generation}},
 								  url = {https://riffusion.com/about},
 								  year = {2022}
 								}
 								```