- See Also
- Gwern
-
Links
- “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Singh & Strouse 2024
- “Tasks That Language Models Don’t Learn”, Lee & Lim 2024
- “A Long-context Language Model for the Generation of Bacteriophage Genomes”, Shao 2023
- “TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering”, Chen et al 2023
- “Positional Description Matters for Transformers Arithmetic”, Shen et al 2023
- “AnyText: Multilingual Visual Text Generation And Editing”, Tuo et al 2023
- “EELBERT: Tiny Models through Dynamic Embeddings”, Cohn et al 2023
- “ChipNeMo: Domain-Adapted LLMs for Chip Design”, Liu et al 2023
- “Tokenizer Choice For LLM Training: Negligible or Crucial?”, Ali et al 2023
- “XVal: A Continuous Number Encoding for Large Language Models”, Golkar et al 2023
- “Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, McCoy et al 2023
- “Subwords As Skills: Tokenization for Sparse-Reward Reinforcement Learning”, Yunis et al 2023
- “PASTA: Pretrained Action-State Transformer Agents”, Boige et al 2023
- “In-context Autoencoder for Context Compression in a Large Language Model”, Ge et al 2023
- “Teaching Arithmetic to Small Transformers”, Lee et al 2023
- “ChatGPT Is Fun, but It Is Not Funny! Humor Is Still Challenging Large Language Models”, Jentzsch & Kersting 2023
- “Bytes Are All You Need: Transformers Operating Directly On File Bytes”, Horton et al 2023
- “FERMAT: An Alternative to Accuracy for Numerical Reasoning”, Sivakumar & Moosavi 2023
- “MEGABYTE: Predicting Million-byte Sequences With Multiscale Transformers”, Yu et al 2023
- “Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, Muffo et al 2023
- “What’s AGI, and Why Are AI Experts Skeptical? ChatGPT and Other Bots Have Revived Conversations on Artificial General Intelligence. Scientists Say Algorithms Won’t Surpass You Any Time Soon”, Rogers 2023
- “BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
- “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
- “LLaMa-1: Open and Efficient Foundation Language Models”, Touvron et al 2023
- “Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Huang et al 2023
- “XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models”, Liang et al 2023
- “Character-Aware Models Improve Visual Text Rendering”, Liu et al 2022
- “NPM: Nonparametric Masked Language Modeling”, Min et al 2022
- “Fast Inference from Transformers via Speculative Decoding”, Leviathan et al 2022
- “Efficient Transformers With Dynamic Token Pooling”, Nawrot et al 2022
- “Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, Tjandra et al 2022
- “LMentry: A Language Model Benchmark of Elementary Language Tasks”, Efrat et al 2022
- “n-gram Is Back: Residual Learning of Neural Text Generation With n-gram Language Model”, Li et al 2022
- “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Chakrabarty et al 2022
- “Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Roush et al 2022
- “Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, Jawahar et al 2022
- “AudioLM: a Language Modeling Approach to Audio Generation”, Borsos et al 2022
- “PIXEL: Language Modelling With Pixels”, Rust et al 2022
- “N-Grammer: Augmenting Transformers With Latent n-grams”, Roy et al 2022
- “Forecasting Future World Events With Neural Networks”, Zou et al 2022
- “SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Liu et al 2022
- “FLOTA: An Embarrassingly Simple Method to Mitigate Und-es-ira-ble Properties of Pretrained Language Model Tokenizers”, Hofmann et al 2022
- “DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Ramesh et al 2022 (page 16 org openai)
- “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Zhu et al 2022
- “Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Gafni et al 2022
- “Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words”, Feng et al 2022
- “Between Words and Characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP”, Mielke et al 2021
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
- “OCR-free Document Understanding Transformer”, Kim et al 2021
- “What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Kim et al 2021
- “Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, Itzhak & Levy 2021
- “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Jaegle et al 2021
- “Charformer: Fast Character Transformers via Gradient-based Subword Tokenization”, Tay et al 2021
- “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
- “Robust Open-Vocabulary Translation from Visual Text Representations”, Salesky et al 2021
- “GPT-3 vs Water Cooler Trivia Participants: A Human vs Robot Showdown”, Waldoch 2021
- “CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation”, Clark et al 2021
- “There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It”, Wang et al 2021
- “Perceiver: General Perception With Iterative Attention”, Jaegle et al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
- “Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words”, Hofmann et al 2021
- “Fast WordPiece Tokenization”, Song et al 2020
- “CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters”, Boukkouri et al 2020
- “Towards End-to-End In-Image Neural Machine Translation”, Mansimov et al 2020
- “Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
- “Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
- “OTEANN: Estimating the Transparency of Orthographies With an Artificial Neural Network”, Marjou 2019
- “GPT-2 Folk Music”, Branwen & Presser 2019
- “BPE-Dropout: Simple and Effective Subword Regularization”, Provilkov et al 2019
- “BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance”, Schick & Schütze 2019
- “Do NLP Models Know Numbers? Probing Numeracy in Embeddings”, Wallace et al 2019
- “Generating Text With Recurrent Neural Networks”, Sutskever et al 2019
- “SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing”, Kudo & Richardson 2018
- “Character-Level Language Modeling With Deeper Self-Attention”, Al-Rfou et al 2018
- “Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, Lau et al 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
- “One Big Net For Everything”, Schmidhuber 2018
- “DeepTingle”, Khalifa et al 2017
- “Multiplicative LSTM for Sequence Modelling”, Krause et al 2016
- “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Wu et al 2016
- “BPEs: Neural Machine Translation of Rare Words With Subword Units”, Sennrich et al 2015
- “Scaling Language Models: Methods, Analysis & Insights from Training Gopher § Table A40: Conversations Can Create the Illusion of Creativity”
- “Commas vs Integers”
- “The Bouba/Kiki Effect And Sound Symbolism In CLIP”
- “BPE Blues”
- “BPE Blues+”
- NineOfNein
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Gwern
“GPT-3 Creative Fiction”, Gwern 2020
“GPT-3 Nonfiction”, Gwern 2020
Links
“Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Singh & Strouse 2024
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
“Tasks That Language Models Don’t Learn”, Lee & Lim 2024
“A Long-context Language Model for the Generation of Bacteriophage Genomes”, Shao 2023
A long-context language model for the generation of bacteriophage genomes
“TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering”, Chen et al 2023
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
“Positional Description Matters for Transformers Arithmetic”, Shen et al 2023
“AnyText: Multilingual Visual Text Generation And Editing”, Tuo et al 2023
“EELBERT: Tiny Models through Dynamic Embeddings”, Cohn et al 2023
“ChipNeMo: Domain-Adapted LLMs for Chip Design”, Liu et al 2023
“Tokenizer Choice For LLM Training: Negligible or Crucial?”, Ali et al 2023
“XVal: A Continuous Number Encoding for Large Language Models”, Golkar et al 2023
xVal: A Continuous Number Encoding for Large Language Models
“Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, McCoy et al 2023
“Subwords As Skills: Tokenization for Sparse-Reward Reinforcement Learning”, Yunis et al 2023
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning
“PASTA: Pretrained Action-State Transformer Agents”, Boige et al 2023
“In-context Autoencoder for Context Compression in a Large Language Model”, Ge et al 2023
In-context Autoencoder for Context Compression in a Large Language Model
“Teaching Arithmetic to Small Transformers”, Lee et al 2023
“ChatGPT Is Fun, but It Is Not Funny! Humor Is Still Challenging Large Language Models”, Jentzsch & Kersting 2023
ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models
“Bytes Are All You Need: Transformers Operating Directly On File Bytes”, Horton et al 2023
Bytes Are All You Need: Transformers Operating Directly On File Bytes
“FERMAT: An Alternative to Accuracy for Numerical Reasoning”, Sivakumar & Moosavi 2023
“MEGABYTE: Predicting Million-byte Sequences With Multiscale Transformers”, Yu et al 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
“Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, Muffo et al 2023
Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition
“What’s AGI, and Why Are AI Experts Skeptical? ChatGPT and Other Bots Have Revived Conversations on Artificial General Intelligence. Scientists Say Algorithms Won’t Surpass You Any Time Soon”, Rogers 2023
“BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
“How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
How well do Large Language Models perform in Arithmetic tasks?
“LLaMa-1: Open and Efficient Foundation Language Models”, Touvron et al 2023
“Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Huang et al 2023
Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
“XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models”, Liang et al 2023
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
“Character-Aware Models Improve Visual Text Rendering”, Liu et al 2022
“NPM: Nonparametric Masked Language Modeling”, Min et al 2022
“Fast Inference from Transformers via Speculative Decoding”, Leviathan et al 2022
“Efficient Transformers With Dynamic Token Pooling”, Nawrot et al 2022
“Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, Tjandra et al 2022
“LMentry: A Language Model Benchmark of Elementary Language Tasks”, Efrat et al 2022
LMentry: A Language Model Benchmark of Elementary Language Tasks
“n-gram Is Back: Residual Learning of Neural Text Generation With n-gram Language Model”, Li et al 2022
n-gram Is Back: Residual Learning of Neural Text Generation with n-gram Language Model
“Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Chakrabarty et al 2022
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)
“Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Roush et al 2022
“Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, Jawahar et al 2022
Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints
“AudioLM: a Language Modeling Approach to Audio Generation”, Borsos et al 2022
“PIXEL: Language Modelling With Pixels”, Rust et al 2022
“N-Grammer: Augmenting Transformers With Latent n-grams”, Roy et al 2022
“Forecasting Future World Events With Neural Networks”, Zou et al 2022
“SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Liu et al 2022
SymphonyNet: Symphony Generation with Permutation Invariant Language Model
“FLOTA: An Embarrassingly Simple Method to Mitigate Und-es-ira-ble Properties of Pretrained Language Model Tokenizers”, Hofmann et al 2022
“DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Ramesh et al 2022 (page 16 org openai)
“ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Zhu et al 2022
ByT5 model for massively multilingual grapheme-to-phoneme conversion
“Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Gafni et al 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
“Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words”, Feng et al 2022
Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words
“Between Words and Characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP”, Mielke et al 2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts
“OCR-free Document Understanding Transformer”, Kim et al 2021
“What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Kim et al 2021
“Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, Itzhak & Levy 2021
Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens
“Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Jaegle et al 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
“Charformer: Fast Character Transformers via Gradient-based Subword Tokenization”, Tay et al 2021
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
“ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
“Robust Open-Vocabulary Translation from Visual Text Representations”, Salesky et al 2021
Robust Open-Vocabulary Translation from Visual Text Representations
“GPT-3 vs Water Cooler Trivia Participants: A Human vs Robot Showdown”, Waldoch 2021
GPT-3 vs Water Cooler Trivia participants: A Human vs Robot Showdown
“CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation”, Clark et al 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
“There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It”, Wang et al 2021
There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It
“Perceiver: General Perception With Iterative Attention”, Jaegle et al 2021
“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
Investigating the Limitations of the Transformers with Simple Arithmetic Tasks
“Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words”, Hofmann et al 2021
Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words
“Fast WordPiece Tokenization”, Song et al 2020
“CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters”, Boukkouri et al 2020
“Towards End-to-End In-Image Neural Machine Translation”, Mansimov et al 2020
“Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining
“Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
Generative Language Modeling for Automated Theorem Proving § Experiments
“OTEANN: Estimating the Transparency of Orthographies With an Artificial Neural Network”, Marjou 2019
OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network
“GPT-2 Folk Music”, Branwen & Presser 2019
“BPE-Dropout: Simple and Effective Subword Regularization”, Provilkov et al 2019
“BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance”, Schick & Schütze 2019
BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance
“Do NLP Models Know Numbers? Probing Numeracy in Embeddings”, Wallace et al 2019
“Generating Text With Recurrent Neural Networks”, Sutskever et al 2019
“SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing”, Kudo & Richardson 2018
“Character-Level Language Modeling With Deeper Self-Attention”, Al-Rfou et al 2018
Character-Level Language Modeling with Deeper Self-Attention
“Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, Lau et al 2018
Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications
“One Big Net For Everything”, Schmidhuber 2018
“DeepTingle”, Khalifa et al 2017
“Multiplicative LSTM for Sequence Modelling”, Krause et al 2016
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Wu et al 2016
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
“BPEs: Neural Machine Translation of Rare Words With Subword Units”, Sennrich et al 2015
BPEs: Neural Machine Translation of Rare Words with Subword Units
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher § Table A40: Conversations Can Create the Illusion of Creativity”
“Commas vs Integers”
“The Bouba/Kiki Effect And Sound Symbolism In CLIP”
“BPE Blues”
“BPE Blues+”
NineOfNein
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
byte-transformer
generation
language-processing
token-strategy
byte-model
Wikipedia
Miscellaneous
-
/doc/ai/nn/tokenization/2024-01-10-gwern-gpt4-usingipasoftwaretotrytounderstandatomatopun.png
-
/doc/ai/nn/tokenization/2023-lee-figure20-naivebpetokenizationbadlydamagesgpt2arithmetictraining.png
: -
/doc/ai/nn/tokenization/2021-liu-table1-spellingtestforbyt5vst5vspalmshowsbyt5spellsmuchbetter.png
-
https://blog.research.google/2021/12/a-fast-wordpiece-tokenization-system.html
-
https://blog.scottlogic.com/2021/08/31/a-primer-on-the-openai-api-1.html
-
https://denyslinkov.medium.com/why-is-gpt-3-15-77x-more-expensive-for-certain-languages-2b19a4adc4bc
: -
https://gist.github.com/moyix/ca4091f16f0b5011bfa8f3f97f705a0d
-
https://github.com/alasdairforsythe/tokenmonster/blob/main/benchmark/pretrain.md
: -
https://twitter.com/MichaelTrazzi/status/1635743595989970945
-
https://twitter.com/arankomatsuzaki/status/1619548480795734016
-
https://twitter.com/tomgoldsteincs/status/1601113497592795136
-
https://twitter.com/tomgoldsteincs/status/1601113501803552768
-
https://twitter.com/tomgoldsteincs/status/1601113505998204928
-
https://www.beren.io/2023-02-04-Integer-tokenization-is-insane/
-
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
-
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation
-
https://www.lesswrong.com/posts/jkY6QdCfAXHJk3kea/the-petertodd-phenomenon
-
https://www.reddit.com/r/ChatGPT/comments/12xai7j/spamming_the_word_stop_2300_times_or_probably_any/
-
https://www.reddit.com/r/mlscaling/comments/146rgq2/chatgpt_is_running_quantized/jnst1t8/
Link Bibliography
-
https://arxiv.org/abs/2402.14903
: “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Aaditya K. Singh, D. J. Strouse -
https://arxiv.org/abs/2402.11349
: “Tasks That Language Models Don’t Learn”, Bruce W. Lee, JaeHyuk Lim -
https://arxiv.org/abs/2311.16465
: “TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering”, Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei -
https://arxiv.org/abs/2307.03381
: “Teaching Arithmetic to Small Transformers”, Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos -
https://arxiv.org/abs/2306.00238#apple
: “Bytes Are All You Need: Transformers Operating Directly On File Bytes”, Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari -
https://www.wired.com/story/what-is-artificial-general-intelligence-agi-explained/
: “What’s AGI, and Why Are AI Experts Skeptical? ChatGPT and Other Bots Have Revived Conversations on Artificial General Intelligence. Scientists Say Algorithms Won’t Surpass You Any Time Soon”, Reece Rogers -
https://arxiv.org/abs/2304.02015#alibaba
: “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang -
https://arxiv.org/abs/2212.10562#google
: “Character-Aware Models Improve Visual Text Rendering”, -
https://arxiv.org/abs/2212.01349#facebook
: “NPM: Nonparametric Masked Language Modeling”, Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer -
https://arxiv.org/abs/2210.13669
: “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Tuhin Chakrabarty, Vishakh Padmakumar, He He -
https://aclanthology.org/2022.cai-1.2.pdf
: “Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy -
https://arxiv.org/abs/2207.06991
: “PIXEL: Language Modelling With Pixels”, Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott -
https://arxiv.org/abs/2206.15474
: “Forecasting Future World Events With Neural Networks”, -
https://aclanthology.org/2022.acl-short.43.pdf
: “FLOTA: An Embarrassingly Simple Method to Mitigate Und-es-ira-ble Properties of Pretrained Language Model Tokenizers”, Valentin Hofmann, Hinrich Schuetze, Janet Pierrehumbert -
https://arxiv.org/pdf/2204.06125.pdf#page=16&org=openai
: “DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen -
https://arxiv.org/abs/2204.03067
: “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Jian Zhu, Cong Zhang, David Jurgens -
https://arxiv.org/abs/2203.13131#facebook
: “Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, Yaniv Taigman -
https://arxiv.org/abs/2108.11193
: “Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, Itay Itzhak, Omer Levy -
https://arxiv.org/abs/2107.14795#deepmind
: “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, -
https://arxiv.org/abs/2106.12672#google
: “Charformer: Fast Character Transformers via Gradient-based Subword Tokenization”, -
https://arxiv.org/abs/2105.13626#google
: “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel -
https://arxiv.org/abs/2103.03206#deepmind
: “Perceiver: General Perception With Iterative Attention”, Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira -
https://arxiv.org/abs/2012.15524#google
: “Fast WordPiece Tokenization”, Xinying Song, Alex Salcianu, Yang Song, Dave Dopson, Denny Zhou -
gpt-2-music
: “GPT-2 Folk Music”, Gwern Branwen, Shawn Presser -
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5
: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever