Stable diffusion for real-time music generation

ai audio diffusers diffusion music stable-diffusion

Go to file

Hayk Martiros 40e1e51c6a Change citation to article		2022-12-18 23:13:08 -08:00
riffusion	Scale by max value	2022-12-16 22:12:39 -06:00
seed_images	adding seed images	2022-12-13 23:33:52 -08:00
.gitignore	Create .gitignore	2022-11-25 13:20:10 -08:00
LICENSE	Add license	2022-12-13 02:47:48 +00:00
README.md	Change citation to article	2022-12-18 23:13:08 -08:00
dev_requirements.txt	Support masks	2022-11-26 06:48:52 +00:00
requirements.txt	Add requirements	2022-11-26 00:13:12 +00:00

README.md

Riffusion Inference Server

Riffusion is an app for real-time music generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

Web app: https://github.com/hmartiro/riffusion-app
Inference server: https://github.com/hmartiro/riffusion-inference
Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1

This repository contains the Python backend does the model inference and audio processing, including:

a diffusers pipeline that performs prompt interpolation combined with image conditioning
a module for (approximately) converting between spectrograms and waveforms
a flask server to provide model inference via API to the next.js app
a model template titled baseten.py for deploying as a Truss

Install

Tested with Python 3.9 and diffusers 0.9.0

conda create --name riffusion-inference python=3.9
conda activate riffusion-inference
python -m pip install -r requirements.txt

Run

Start the Flask server:

python -m riffusion.server --port 3013 --host 127.0.0.1

You can specify --checkpoint with your own directory or huggingface ID in diffusers format.

The model endpoint is now available at http://127.0.0.1:3013/run_inference via POST request.

Example input (see InferenceInput for the API):

{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}

Example output (see InferenceOutput for the API):

{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}

Citation

If you build on this work, please cite it as follows:

@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}