2022-12-27 09:11:50 -07:00
# :guitar: Riffusion
2022-11-25 14:20:30 -07:00
2022-12-27 09:30:19 -07:00
< a href = "https://github.com/riffusion/riffusion/actions/workflows/ci.yml?query=branch%3Amain" > < img alt = "CI status" src = "https://github.com/riffusion/riffusion/actions/workflows/ci.yml/badge.svg" / > < / a >
< img alt = "Python 3.9 | 3.10" src = "https://img.shields.io/badge/Python-3.9%20%7C%203.10-blue" / >
< a href = "https://github.com/riffusion/riffusion/tree/main/LICENSE" > < img alt = "MIT License" src = "https://img.shields.io/badge/License-MIT-yellowgreen" / > < / a >
2022-12-26 22:32:42 -07:00
Riffusion is a library for real-time music and audio generation with stable diffusion.
2022-11-25 17:30:11 -07:00
2022-12-12 20:37:41 -07:00
Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.
2022-12-12 19:55:40 -07:00
2022-12-27 09:11:50 -07:00
This is the core repository for riffusion image and audio processing code.
2022-12-26 22:32:42 -07:00
2022-12-27 09:11:50 -07:00
* Diffusion pipeline that performs prompt interpolation combined with image conditioning
* Conversions between spectrogram images and audio clips
* Command-line interface for common tasks
* Interactive app using streamlit
* Flask server to provide model inference via API
* Various third party integrations
2022-12-26 22:32:42 -07:00
Related repositories:
2022-12-22 21:51:27 -07:00
* Web app: https://github.com/riffusion/riffusion-app
2022-12-12 20:24:24 -07:00
* Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1
2022-11-25 17:30:11 -07:00
2022-12-26 22:32:42 -07:00
## Citation
2022-12-12 20:37:41 -07:00
2022-12-26 22:32:42 -07:00
If you build on this work, please cite it as follows:
2022-12-12 20:37:41 -07:00
2022-12-26 22:32:42 -07:00
```
@article {Forsgren_Martiros_2022,
author = {Forsgren, Seth* and Martiros, Hayk*},
title = {{Riffusion - Stable diffusion for real-time music generation}},
url = {https://riffusion.com/about},
year = {2022}
}
```
2022-12-12 20:37:41 -07:00
2022-11-25 17:30:11 -07:00
## Install
2022-12-22 21:51:27 -07:00
2022-12-27 09:11:50 -07:00
Tested in CI with Python 3.9 and 3.10.
2022-12-22 21:51:27 -07:00
2022-12-27 08:22:34 -07:00
It's highly recommended to set up a virtual Python environment with `conda` or `virtualenv` :
2022-11-25 17:30:11 -07:00
```
2022-12-26 22:32:42 -07:00
conda create --name riffusion python=3.9
conda activate riffusion
2022-12-27 08:22:34 -07:00
```
Install Python dependencies:
```
2022-11-25 17:30:11 -07:00
python -m pip install -r requirements.txt
```
2022-12-27 09:11:50 -07:00
In order to use audio formats other than WAV, [ffmpeg ](https://ffmpeg.org/download.html ) is required.
2022-12-27 08:22:34 -07:00
```
2022-12-27 09:11:50 -07:00
sudo apt-get install ffmpeg # linux
brew install ffmpeg # mac
2022-12-27 08:22:34 -07:00
conda install -c conda-forge ffmpeg # conda
```
2022-12-29 12:40:20 -07:00
If torchaudio has no backend, you may need to install `libsndfile` . See [this issue ](https://github.com/riffusion/riffusion/issues/12 ).
2022-12-22 21:51:27 -07:00
2022-12-27 09:11:50 -07:00
If you have an issue, try upgrading [diffusers ](https://github.com/huggingface/diffusers ). Tested with 0.9 - 0.11.
2022-12-22 21:51:27 -07:00
Guides:
2022-12-27 09:11:50 -07:00
* [Simple Install Guide for Windows ](https://www.reddit.com/r/riffusion/comments/zrubc9/installation_guide_for_riffusion_app_inference/ )
2022-12-22 21:51:27 -07:00
2022-12-26 22:32:42 -07:00
## Backends
2022-12-27 09:11:50 -07:00
### CPU
`cpu` is supported but is quite slow.
### CUDA
2022-12-26 22:32:42 -07:00
`cuda` is the recommended and most performant backend.
To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the
[install guide ](https://pytorch.org/get-started/locally/ ) or
2022-12-27 09:11:50 -07:00
[stable wheels ](https://download.pytorch.org/whl/torch_stable.html ).
2022-12-26 22:32:42 -07:00
2022-12-27 08:22:34 -07:00
To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50
steps in under five seconds, such as a 3090 or A10G.
2022-12-27 09:11:50 -07:00
Test availability with:
2022-12-26 22:32:42 -07:00
2022-12-27 09:11:50 -07:00
```python3
import torch
torch.cuda.is_available()
```
2022-12-26 22:32:42 -07:00
2022-12-27 09:11:50 -07:00
### MPS
2022-12-26 22:32:42 -07:00
The `mps` backend on Apple Silicon is supported for inference but some operations fall back to CPU,
particularly for audio processing. You may need to set
2022-12-27 09:11:50 -07:00
`PYTORCH_ENABLE_MPS_FALLBACK=1` .
2022-12-26 22:32:42 -07:00
In addition, this backend is not deterministic.
2022-12-27 09:11:50 -07:00
Test availability with:
```python3
import torch
torch.backends.mps.is_available()
```
2022-12-26 22:32:42 -07:00
## Command-line interface
Riffusion comes with a command line interface for performing common tasks.
See available commands:
```
2022-12-27 08:22:34 -07:00
python -m riffusion.cli -h
2022-12-26 22:32:42 -07:00
```
Get help for a specific command:
```
python -m riffusion.cli image-to-audio -h
```
Execute:
```
python -m riffusion.cli image-to-audio --image spectrogram_image.png --audio clip.wav
```
2022-12-27 08:44:39 -07:00
## Riffusion Playground
2022-12-26 22:32:42 -07:00
2022-12-27 09:30:19 -07:00
Riffusion contains a [streamlit ](https://streamlit.io/ ) app for interactive use and exploration.
2022-12-26 22:32:42 -07:00
Run with:
```
2022-12-27 08:22:34 -07:00
python -m streamlit run riffusion/streamlit/playground.py --browser.serverAddress 127.0.0.1 --browser.serverPort 8501
2022-12-26 22:32:42 -07:00
```
And access at http://127.0.0.1:8501/
2022-12-27 09:30:19 -07:00
< img alt = "Riffusion Playground" style = "width: 600px" src = "https://i.imgur.com/OOMKBbT.png" / >
2022-12-26 18:26:46 -07:00
## Run the model server
2022-12-26 22:32:42 -07:00
2022-12-27 09:11:50 -07:00
Riffusion can be run as a flask server that provides inference via API. This server enables the [web app ](https://github.com/riffusion/riffusion-app ) to run locally.
Run with:
2022-12-26 22:32:42 -07:00
2022-11-25 17:30:11 -07:00
```
2022-12-26 18:26:46 -07:00
python -m riffusion.server --host 127.0.0.1 --port 3013
2022-11-25 17:30:11 -07:00
```
2022-12-12 23:43:46 -07:00
You can specify `--checkpoint` with your own directory or huggingface ID in diffusers format.
2022-12-26 22:32:42 -07:00
Use the `--device` argument to specify the torch device to use.
2022-11-25 17:30:11 -07:00
The model endpoint is now available at `http://127.0.0.1:3013/run_inference` via POST request.
Example input (see [InferenceInput ](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L28 ) for the API):
```
{
2022-12-15 12:28:21 -07:00
"alpha": 0.75,
"num_inference_steps": 50,
"seed_image_id": "og_beat",
"start": {
"prompt": "church bells on sunday",
"seed": 42,
"denoising": 0.75,
"guidance": 7.0
2022-11-25 17:30:11 -07:00
},
2022-12-15 12:28:21 -07:00
"end": {
"prompt": "jazz with piano",
"seed": 123,
"denoising": 0.75,
"guidance": 7.0
}
2022-11-25 17:30:11 -07:00
}
```
Example output (see [InferenceOutput ](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L54 ) for the API):
```
{
2022-12-15 12:28:21 -07:00
"image": "< base64 encoded JPEG image > ",
"audio": "< base64 encoded MP3 clip > "
2022-11-25 17:30:11 -07:00
}
```
2022-12-14 22:15:02 -07:00
2022-12-27 09:11:50 -07:00
## Tests
2022-12-26 18:26:46 -07:00
Tests live in the `test/` directory and are implemented with `unittest` .
To run all tests:
```
python -m unittest test/*_test.py
```
To run a single test:
```
python -m unittest test.audio_to_image_test
```
To preserve temporary outputs for debugging, set `RIFFUSION_TEST_DEBUG` :
```
RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test
```
2022-12-27 01:32:10 -07:00
To run a single test case within a test:
2022-12-26 18:26:46 -07:00
```
python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo
```
To run tests using a specific torch device, set `RIFFUSION_TEST_DEVICE` . Tests should pass with
`cpu` , `cuda` , and `mps` backends.
2022-12-27 09:11:50 -07:00
## Development Guide
2022-12-27 08:22:34 -07:00
Install additional packages for dev with `python -m pip install -r dev_requirements.txt` .
2022-12-26 18:26:46 -07:00
* Linter: `ruff`
* Formatter: `black`
* Type checker: `mypy`
These are configured in `pyproject.toml` .
The results of `mypy .` , `black .` , and `ruff .` *must* be clean to accept a PR.
2022-12-26 22:32:42 -07:00
CI is run through GitHub Actions from `.github/workflows/ci.yml` .
2022-12-14 22:15:02 -07:00
2022-12-27 09:11:50 -07:00
Contributions are welcome through pull requests.