103 lines
3.0 KiB
Plaintext
103 lines
3.0 KiB
Plaintext
|
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||
|
|
||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||
|
the License. You may obtain a copy of the License at
|
||
|
|
||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
|
||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||
|
specific language governing permissions and limitations under the License.
|
||
|
-->
|
||
|
|
||
|
# Audio Diffusion
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.
|
||
|
|
||
|
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
|
||
|
and from mel spectrogram images.
|
||
|
|
||
|
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
|
||
|
training scripts and example notebooks.
|
||
|
|
||
|
## Available Pipelines:
|
||
|
|
||
|
| Pipeline | Tasks | Colab
|
||
|
|---|---|:---:|
|
||
|
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |
|
||
|
|
||
|
|
||
|
## Examples:
|
||
|
|
||
|
### Audio Diffusion
|
||
|
|
||
|
```python
|
||
|
import torch
|
||
|
from IPython.display import Audio
|
||
|
from diffusers import DiffusionPipeline
|
||
|
|
||
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||
|
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)
|
||
|
|
||
|
output = pipe()
|
||
|
display(output.images[0])
|
||
|
display(Audio(output.audios[0], rate=mel.get_sample_rate()))
|
||
|
```
|
||
|
|
||
|
### Latent Audio Diffusion
|
||
|
|
||
|
```python
|
||
|
import torch
|
||
|
from IPython.display import Audio
|
||
|
from diffusers import DiffusionPipeline
|
||
|
|
||
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||
|
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)
|
||
|
|
||
|
output = pipe()
|
||
|
display(output.images[0])
|
||
|
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
|
||
|
```
|
||
|
|
||
|
### Audio Diffusion with DDIM (faster)
|
||
|
|
||
|
```python
|
||
|
import torch
|
||
|
from IPython.display import Audio
|
||
|
from diffusers import DiffusionPipeline
|
||
|
|
||
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||
|
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)
|
||
|
|
||
|
output = pipe()
|
||
|
display(output.images[0])
|
||
|
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
|
||
|
```
|
||
|
|
||
|
### Variations, in-painting, out-painting etc.
|
||
|
|
||
|
```python
|
||
|
output = pipe(
|
||
|
raw_audio=output.audios[0, 0],
|
||
|
start_step=int(pipe.get_default_steps() / 2),
|
||
|
mask_start_secs=1,
|
||
|
mask_end_secs=1,
|
||
|
)
|
||
|
display(output.images[0])
|
||
|
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
|
||
|
```
|
||
|
|
||
|
## AudioDiffusionPipeline
|
||
|
[[autodoc]] AudioDiffusionPipeline
|
||
|
- __call__
|
||
|
- encode
|
||
|
- slerp
|
||
|
|
||
|
|
||
|
## Mel
|
||
|
[[autodoc]] Mel
|
||
|
- audio_slice_to_image
|
||
|
- image_to_audio
|