Pixel recurrent neural networks
Conditional image generation with pixelcnn decoders
The concrete distribution: A continuous relaxation of discrete random variables
------
https://arxiv.org/pdf/2102.12092.pdf
Zero-Shot Text-to-Image Generation
Hierarchical text-conditional image generation with cliplatents.
Towardsphotorealistic image generation and editing with text-guided diffusion models
Photorealistic text-to-image diffusion models with deep language understanding
Vector-quantized imagemodeling with improved vqgan
Scalingautoregressive models for content-rich text-to-image generation.
High-resolution image synthesis with latent diffusionmodels.
Neural discrete representation learning
Pre-training of deep bidirectional transformers for languageunderstanding
Exploring the limitsof transfer learning with a unified text-to-text transformer
Masked autoencoders are scalable vision learners
Maskgit: Masked generative image transformer
Laion-400m: Open dataset of clip-filtered 400 million image-text pairs.
Laion-5b: An open large-scale dataset for training next generation image-text models