DALL·E 1 tag

Gwern Branwen

See Also
Links
Miscellaneous
Link Bibliography

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Links

“Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024

Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction

“Image Captioners Are Scalable Vision Learners Too”, Tschannen et al 2023

Image Captioners Are Scalable Vision Learners Too

“Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, Samo & Highhouse 2023

Artificial intelligence and art: Identifying the esthetic judgment factors that distinguish human & machine-generated artwork

“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Wang et al 2023

VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

“Retrieval-Augmented Multimodal Language Modeling”, Yasunaga et al 2022

Retrieval-Augmented Multimodal Language Modeling

“Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022

Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

“CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers”, Hong et al 2022

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

“CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers”, Ding et al 2022

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

“MaskGIT: Masked Generative Image Transformer”, Chang et al 2022

MaskGIT: Masked Generative Image Transformer

“CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022

CM3: A Causal Masked Multimodal Model of the Internet

“ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

“Emojich—Zero-Shot Emoji Generation Using Russian Language: a Technical Report”, Shonenkov et al 2021

Emojich—zero-shot emoji generation using Russian language: a technical report

“LAFITE: Towards Language-Free Training for Text-To-Image Generation”, Zhou et al 2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation

“NÜWA: Visual Synthesis Pre-Training for Neural VisUal World CreAtion”, Wu et al 2021

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

“L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021

L-Verse: Bidirectional Generation Between Image and Text

“Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021

Telling Creative Stories Using Generative Visual Aids

“Unifying Multimodal Transformer for Bi-Directional Image and Text Generation”, Huang et al 2021

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

“Illiterate DALL·E Learns to Compose”, Singh et al 2021

Illiterate DALL·E Learns to Compose

“What Users Want? WARHOL: A Generative Model for Recommendation”, Samaran et al 2021

What Users Want? WARHOL: A Generative Model for Recommendation

“ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Zhu et al 2021

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

“Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Du 2021

Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters

“M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhang et al 2021

M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis

“CogView: Mastering Text-To-Image Generation via Transformers”, Ding et al 2021

CogView: Mastering Text-to-Image Generation via Transformers

“GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Wu et al 2021

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

“VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021

VideoGPT: Video Generation using VQ-VAE and Transformers

“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, Synced 2021

China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.

“Paint by Word”, Bau et al 2021

Paint by Word

“Generating Images With Sparse Representations”, Nash et al 2021

Generating Images with Sparse Representations

“M6: A Chinese Multimodal Pretrainer”, Lin et al 2021

M6: A Chinese Multimodal Pretrainer

“DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021

DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language

“Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020

Taming Transformers for High-Resolution Image Synthesis

“Text-To-Image Generation Grounded by Fine-Grained User Attention”, Koh et al 2020

Text-to-Image Generation Grounded by Fine-Grained User Attention

“X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Cho et al 2020

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

“IGPT: Generative Pretraining from Pixels”, Chen et al 2020

iGPT: Generative Pretraining from Pixels

“Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Chen et al 2020

Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples

“The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Hao 2020

The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism

“Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning”, Sharma et al 2018

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

“Image Transformer”, Parmar et al 2018

Image Transformer

“VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017

VQ-VAE: Neural Discrete Representation Learning

“Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016

Categorical Reparameterization with Gumbel-Softmax

“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

“The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”

The Little Red Boat Story (Make-A-Scene): Our own model was used to generate all the images in the story, by providing a text and simple sketch input

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.