Have you recently been impressed by the wonderful art generated by Diffusion models such as DALL-E and Imagen? Do you yearn to understand how these generative models work? Calvin Luo from Google Research’s Brain Team recently open-source published a detailed guide of how Diffusion models work.
The paper is “Understanding Diffusion Models: A Unified Perspective“.
He derives the state-of-the art in generative models from first principles and understanding of Variational Autoencoders (VAEs) and the Evidence Lower Bound (ELBO). More importantly, he provides a unified view of several modern generative models. Even though this is a well written paper, working your way through it is not a trivial task. You have to have a working understanding of the mathematics involved in theoretical machine learning such as calculus and probability theory, especially being comfortable with calculating expectations.
If you want to understand how text to image generative models work then this paper is certainly a good guide.