- See Also
-
Links
- “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024
- “Image Captioners Are Scalable Vision Learners Too”, Tschannen et al 2023
- “Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, Samo & Highhouse 2023
- “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Wang et al 2023
- “Retrieval-Augmented Multimodal Language Modeling”, Yasunaga et al 2022
- “Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022
- “CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers”, Hong et al 2022
- “CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers”, Ding et al 2022
- “MaskGIT: Masked Generative Image Transformer”, Chang et al 2022
- “CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022
- “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021
- “Emojich—Zero-Shot Emoji Generation Using Russian Language: a Technical Report”, Shonenkov et al 2021
- “LAFITE: Towards Language-Free Training for Text-To-Image Generation”, Zhou et al 2021
- “NÜWA: Visual Synthesis Pre-Training for Neural VisUal World CreAtion”, Wu et al 2021
- “L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021
- “Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021
- “Unifying Multimodal Transformer for Bi-Directional Image and Text Generation”, Huang et al 2021
- “Illiterate DALL·E Learns to Compose”, Singh et al 2021
- “What Users Want? WARHOL: A Generative Model for Recommendation”, Samaran et al 2021
- “ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Zhu et al 2021
- “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Du 2021
- “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhang et al 2021
- “CogView: Mastering Text-To-Image Generation via Transformers”, Ding et al 2021
- “GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Wu et al 2021
- “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021
- “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, Synced 2021
- “Paint by Word”, Bau et al 2021
- “Generating Images With Sparse Representations”, Nash et al 2021
- “M6: A Chinese Multimodal Pretrainer”, Lin et al 2021
- “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021
- “Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020
- “Text-To-Image Generation Grounded by Fine-Grained User Attention”, Koh et al 2020
- “X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Cho et al 2020
- “IGPT: Generative Pretraining from Pixels”, Chen et al 2020
- “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Chen et al 2020
- “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Hao 2020
- “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning”, Sharma et al 2018
- “Image Transformer”, Parmar et al 2018
- “VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017
- “Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016
- “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016
- “The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
“Image Captioners Are Scalable Vision Learners Too”, Tschannen et al 2023
“Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, Samo & Highhouse 2023
“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Wang et al 2023
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
“Retrieval-Augmented Multimodal Language Modeling”, Yasunaga et al 2022
“Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
“CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers”, Hong et al 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
“CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers”, Ding et al 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
“MaskGIT: Masked Generative Image Transformer”, Chang et al 2022
“CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022
“ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
“Emojich—Zero-Shot Emoji Generation Using Russian Language: a Technical Report”, Shonenkov et al 2021
Emojich—zero-shot emoji generation using Russian language: a technical report
“LAFITE: Towards Language-Free Training for Text-To-Image Generation”, Zhou et al 2021
LAFITE: Towards Language-Free Training for Text-to-Image Generation
“NÜWA: Visual Synthesis Pre-Training for Neural VisUal World CreAtion”, Wu et al 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
“L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021
“Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021
“Unifying Multimodal Transformer for Bi-Directional Image and Text Generation”, Huang et al 2021
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
“Illiterate DALL·E Learns to Compose”, Singh et al 2021
“What Users Want? WARHOL: A Generative Model for Recommendation”, Samaran et al 2021
What Users Want? WARHOL: A Generative Model for Recommendation
“ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Zhu et al 2021
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
“Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Du 2021
Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters
“M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhang et al 2021
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
“CogView: Mastering Text-To-Image Generation via Transformers”, Ding et al 2021
CogView: Mastering Text-to-Image Generation via Transformers
“GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Wu et al 2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
“VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021
“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, Synced 2021
“Paint by Word”, Bau et al 2021
“Generating Images With Sparse Representations”, Nash et al 2021
“M6: A Chinese Multimodal Pretrainer”, Lin et al 2021
“DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021
“Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020
“Text-To-Image Generation Grounded by Fine-Grained User Attention”, Koh et al 2020
Text-to-Image Generation Grounded by Fine-Grained User Attention
“X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Cho et al 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
“IGPT: Generative Pretraining from Pixels”, Chen et al 2020
“Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Chen et al 2020
“The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Hao 2020
“Image Transformer”, Parmar et al 2018
“VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017
“Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016
“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
“The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
synthesis
crossmodal
generative-models
Miscellaneous
Link Bibliography
-
2023-samo.pdf
: “Artificial Intelligence and Art: Identifying the Esthetic Judgment Factors That Distinguish Human & Machine-Generated Artwork”, Andrew Samo, Scott Highhouse -
https://arxiv.org/abs/2301.02111#microsoft
: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, -
https://arxiv.org/abs/2211.12561#facebook
: “Retrieval-Augmented Multimodal Language Modeling”, -
https://arxiv.org/abs/2205.15868
: “CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers”, Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang -
https://arxiv.org/abs/2204.14217#baai
: “CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers”, Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang -
https://arxiv.org/abs/2112.15283#baidu
: “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang -
https://arxiv.org/abs/2111.11133
: “L-Verse: Bidirectional Generation Between Image and Text”, -
https://en.pingwest.com/a/8693#baai
: “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Chen Du -
https://arxiv.org/abs/2105.14211#alibaba
: “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang -
https://arxiv.org/abs/2105.13290#baai
: “CogView: Mastering Text-To-Image Generation via Transformers”, -
https://arxiv.org/abs/2104.10157
: “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas -
https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/#baai
: “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, Synced -
https://openai.com/research/dall-e
: “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, -
2020-chen-2.pdf#openai
: “IGPT: Generative Pretraining from Pixels”, Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever -
https://openai.com/research/image-gpt
: “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Mark Chen, Alec Radford, Ilya Sutskever -
https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/
: “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Karen Hao -
2018-sharma.pdf#google
: “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning”, Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut