Stable diffusion for real-time music generation

ai audio diffusers diffusion music stable-diffusion

Go to file

Hayk Martiros 7f27705f81 save JPEG instead of PNG		2022-11-26 19:24:50 +00:00
riffusion	save JPEG instead of PNG	2022-11-26 19:24:50 +00:00
seed_images	Support masks	2022-11-26 06:48:52 +00:00
.gitignore	Create .gitignore	2022-11-25 13:20:10 -08:00
README.md	Merge branch 'main' of https://github.com/hmartiro/riffusion-inference into main	2022-11-26 06:49:52 +00:00
dev_requirements.txt	Support masks	2022-11-26 06:48:52 +00:00
requirements.txt	Add requirements	2022-11-26 00:13:12 +00:00

README.md

Riffusion

Python backend for the Riffusion app that does the model inference and audio processing.

a diffusers pipeline that performs prompt interpolation combined with image conditioning
a module for (approximately) converting between spectrograms and waveforms
a flask server to provide model inference via API to the next.js app

The web app lives at https://github.com/hmartiro/riffusion-app

Install

Tested with Python 3.9 and diffusers 0.9.0

conda create --name riffusion-inference python=3.9
conda activate riffusion-inference
python -m pip install -r requirements.txt

Run

Start the Flask server:

python -m riffusion.server --port 3013 --host 127.0.0.1 --checkpoint /path/to/diffusers_checkpoint

The model endpoint is now available at http://127.0.0.1:3013/run_inference via POST request.

Example input (see InferenceInput for the API):

{
  alpha: 0.75,
  num_inference_steps: 50,
  seed_image_id: "og_beat",

  start: {
    prompt: "church bells on sunday",
    seed: 42,
    denoising: 0.75,
    guidance: 7.0,
  },

  end: {
    prompt: "jazz with piano",
    seed: 123,
    denoising: 0.75,
    guidance: 7.0,
  },
}

Example output (see InferenceOutput for the API):

{
  image: "< base64 encoded PNG >",
  audio: "< base64 encoded MP3 clip >",,
}