Improve about page

This commit is contained in:
Hayk Martiros 2022-12-12 17:28:59 -08:00
parent e7a33dda1f
commit f645306655
3 changed files with 50 additions and 25 deletions

View File

@ -1,4 +1,5 @@
import Image from "next/image";
import Link from "next/link";
import PageHead from "../components/PageHead";
import CaptionedImage from "../components/about/CaptionedImage";
@ -7,6 +8,8 @@ import { BsInfoCircleFill } from "react-icons/bs";
import handDrawnSpectrogramImg from "../public/about/hand_drawn_spectrogram.png";
import fourierTransformImg from "../public/about/fourier_transform.png";
import img2imgExample from "../public/about/img2img_example.png";
import latentSpaceInterpolation from "../public/about/latent_space_interpolation.png";
import spectrogramLabelImg from "../public/about/spectrogram_label.png";
import webAppScreenshot from "../public/about/web_app_screenshot.png";
@ -22,9 +25,9 @@ export default function Home() {
<main className="bg-white flex flex-row text-black place-content-center">
<div className="w-3/4 md:w-2/3 lg:w-1/2 text-lg pb-20">
<h1 className="pt-16 pb-1 text-4xl font-bold">
<a href="/" className="no-underline">
<Link href="/" className="no-underline">
[ RIFFUSION ]
</a>
</Link>
</h1>
<h3 className="font-medium italic text-xl pb-6">
(noun): riff + diffusion
@ -35,9 +38,9 @@ export default function Home() {
<BsInfoCircleFill className="h-6 w-6 flex-shrink-0" />
<div>
<div className="mt-0.5 text-sm font-semibold">
<a href="/" className="no-underline">
<Link href="/" className="no-underline">
Riffusion
</a>{" "}
</Link>{" "}
was created by{" "}
<a
className="no-underline text-slate-800"
@ -159,7 +162,11 @@ export default function Home() {
alt={"hand drawn spectrogram"}
/>
<div className="m-5 ml-24">
<audio controls src="/about/hand_drawn.mp3" className="w-2/3 md:w-1/2">
<audio
controls
src="/about/hand_drawn.mp3"
className="w-2/3 md:w-1/2"
>
Your browser does not support audio.
</audio>
</div>
@ -185,8 +192,8 @@ export default function Home() {
denoising strength parameter.
</p>
<p className="mt-3">
For example, here is that funky sax riff on the left, and on the
right is a modification of it to crank up the piano:
For example, here is that funky sax riff again, followed by a
modification to crank up the piano:
</p>
<div className="grid grid-cols-1 md:grid-cols-2 gap-3">
<div>
@ -270,6 +277,11 @@ export default function Home() {
them loop-able, we also create initial images that are an exact
number of measures.
</p>
<Image
className="ml-4 md:ml-8 m-5 w-5/6 md:w-4/5"
src={img2imgExample}
alt={"img2img generation example"}
/>
<p className="mt-3">
However, even with this approach it's still too abrupt to transition
between clips. Multiple interpretations of the same prompt with the
@ -288,8 +300,8 @@ export default function Home() {
decodes to a viable output.
</p>
<p className="mt-3">
The key is that we can continuously sample the latent space between
a prompt with two different seeds, or two different prompts with the
The key is that it's possible to sample the latent space between a
prompt with two different seeds, or two different prompts with the
same seed. Here is an example with the visual model:
</p>
<CaptionedImage
@ -297,12 +309,23 @@ export default function Home() {
caption={""}
/>
<p className="mt-3">
We can do the same thing with our model, which often results in
We can do the same thing with our model, which often produces
buttery smooth transitions, even between starkly different prompts.
This is vastly more interesting than interpolating the raw audio,
This is much more interesting than interpolating the raw audio,
because in the latent space all in-between points still sound like
plausible clips.
plausible clips. The figure below is colorized to show the latent
space interpolation between two seeds of the same prompt. Playing
this sequence is much smoother than just playing the two endpoints.
The interpolated clips are often diverse and have their own riffs
and motifs come and go.
</p>
<Image
className="ml-2 md:ml-4 m-5 w-11/12"
src={latentSpaceInterpolation}
alt={"Latent space interpolation example"}
/>
{/* TODO(hayk): Move one of these examples to the bottom. */}
<p className="mt-3">
Here is one of our favorites, a beautiful 20-step interpolation from
@ -339,11 +362,10 @@ export default function Home() {
The huggingface{" "}
<a href="https://github.com/huggingface/diffusers">diffusers</a>{" "}
library implements a wide range of pipelines including
image-to-image and prompt interpolation, but we did not find an
implementation that was able to do prompt interpolation combined
with image-to-image conditioning. We implemented this pipeline,
along with support for masking to limit generation to only parts of
an image. Code{" "}
image-to-image and prompt interpolation, but we needed an
implementation for interpolation combined with image-to-image
conditioning. We implemented this pipeline, along with support for
masking to limit generation to only parts of an image. Code{" "}
<a href="https://github.com/hmartiro/riffusion-inference/blob/main/riffusion/riffusion_pipeline.py">
here
</a>
@ -358,7 +380,9 @@ export default function Home() {
<p className="mt-3">
As the user types in new prompts, the audio smoothly transitions to
the new prompt. If there is no new prompt, the app will interpolate
between different seeds of the same prompt.
between different seeds of the same prompt. Spectrograms are
visualized as 3D height maps along a timeline with a translucent
playhead.
</p>
<Image
className="ml-8 md:ml-16 m-5 w-3/4 md:w-2/3"
@ -369,15 +393,15 @@ export default function Home() {
The app is built using <a href="https://nextjs.org/">Next.js</a>,{" "}
<a href="https://reactjs.org/">React</a>,{" "}
<a href="https://www.typescriptlang.org/">Typescript</a>,{" "}
<a href="https://threejs.org/">three.js</a>, and{" "}
<a href="https://tailwindcss.com/">Tailwind</a>, and deployed with{" "}
<a href="https://threejs.org/">three.js</a>,{" "}
<a href="https://tailwindcss.com/">Tailwind</a>, and{" "}
<a href="https://vercel.com/">Vercel</a>.
</p>
<p className="mt-3">
It communicates with the sever over an API to run the inference calls
on a GPU. We built a flask server for initial testing, and deployed
the production model on <a href="https://www.baseten.co/">Baseten</a>{" "}
for autoscaling and load balancing.
The app communicates over an API to run the inference calls on a GPU
server. We built a flask server for testing, and deployed to
production on <a href="https://www.baseten.co/">Baseten</a> for
autoscaling and load balancing.
</p>
<p className="mt-3">
The web app code is at{" "}
@ -394,7 +418,8 @@ export default function Home() {
.
</p>
<p className="mt-3">
If you have a powerful GPU, you can run the experience locally.
If you have a powerful GPU that can generate stable diffusion
results in under five seconds, you can run the experience locally.
</p>
<h2 className="pt-10 pb-5 text-3xl font-bold">Samples</h2>
<p>Some of our favorite prompts and results.</p>

Binary file not shown.

After

Width:  |  Height:  |  Size: 995 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 MiB