riffusion-inference/README.md

81 lines
2.2 KiB
Markdown
Raw Normal View History

2022-12-12 20:37:41 -07:00
# Riffusion Inference Server
2022-11-25 14:20:30 -07:00
2022-12-12 20:40:21 -07:00
Riffusion is an app for real-time music generation with stable diffusion.
2022-11-25 17:30:11 -07:00
2022-12-12 20:37:41 -07:00
Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.
2022-12-12 19:55:40 -07:00
2022-12-12 20:24:24 -07:00
* Web app: https://github.com/hmartiro/riffusion-app
* Inference server: https://github.com/hmartiro/riffusion-inference
* Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1
2022-11-25 17:30:11 -07:00
2022-12-12 20:40:21 -07:00
This repository contains the Python backend does the model inference and audio processing, including:
2022-12-12 20:37:41 -07:00
* a diffusers pipeline that performs prompt interpolation combined with image conditioning
* a module for (approximately) converting between spectrograms and waveforms
* a flask server to provide model inference via API to the next.js app
2022-12-12 23:06:14 -07:00
* a model template titled baseten.py for deploying as a Truss
2022-12-12 20:37:41 -07:00
2022-11-25 17:30:11 -07:00
## Install
Tested with Python 3.9 and diffusers 0.9.0
```
conda create --name riffusion-inference python=3.9
conda activate riffusion-inference
python -m pip install -r requirements.txt
```
## Run
Start the Flask server:
```
2022-12-12 23:43:46 -07:00
python -m riffusion.server --port 3013 --host 127.0.0.1
2022-11-25 17:30:11 -07:00
```
2022-12-12 23:43:46 -07:00
You can specify `--checkpoint` with your own directory or huggingface ID in diffusers format.
2022-11-25 17:30:11 -07:00
The model endpoint is now available at `http://127.0.0.1:3013/run_inference` via POST request.
Example input (see [InferenceInput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L28) for the API):
```
{
alpha: 0.75,
num_inference_steps: 50,
2022-11-25 23:48:52 -07:00
seed_image_id: "og_beat",
2022-11-25 17:30:11 -07:00
start: {
prompt: "church bells on sunday",
seed: 42,
denoising: 0.75,
guidance: 7.0,
},
end: {
prompt: "jazz with piano",
seed: 123,
denoising: 0.75,
guidance: 7.0,
},
}
```
Example output (see [InferenceOutput](https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/datatypes.py#L54) for the API):
```
{
2022-12-12 19:55:40 -07:00
image: "< base64 encoded JPEG image >",
2022-11-25 17:30:11 -07:00
audio: "< base64 encoded MP3 clip >",,
}
```
2022-12-14 22:15:02 -07:00
## Citation
If you build on this work, please cite it as follows:
```
@software{Forsgren_Martiros_2022,
author = {Forsgren, Seth* and Martiros, Hayk*},
title = {{Riffusion - Stable diffusion for real-time music generation}},
url = {https://riffusion.com/about},
year = {2022}
}
```