RNN tag

See Also
Gwern
- “Absolute Unit NNs: Regression-Based MLPs for Everything”, Gwern 2023
- “RNN Metadata for Mimicking Author Style”, Gwern 2015
Links
Miscellaneous
Link Bibliography

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

See Also

Gwern

“Absolute Unit NNs: Regression-Based MLPs for Everything”, Gwern 2023

Absolute Unit NNs: Regression-Based MLPs for Everything

“RNN Metadata for Mimicking Author Style”, Gwern 2015

RNN Metadata for Mimicking Author Style

Links

“ZigMa: Zigzag Mamba Diffusion Model”, Hu et al 2024

ZigMa: Zigzag Mamba Diffusion Model

“RNNs Are Not Transformers (Yet): The Key Bottleneck on In-Context Retrieval”, Wen et al 2024

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

“MambaByte: Token-Free Selective State Space Model”, Wang et al 2024

MambaByte: Token-free Selective State Space Model

“MoE-Mamba: Efficient Selective State Space Models With Mixture of Experts”, Pióro et al 2024

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

“Evolving Reservoirs for Meta Reinforcement Learning”, Léger et al 2023

Evolving Reservoirs for Meta Reinforcement Learning

“Zoology: Measuring and Improving Recall in Efficient Language Models”, Arora et al 2023

Zoology: Measuring and Improving Recall in Efficient Language Models

“Mamba: Linear-Time Sequence Modeling With Selective State Spaces”, Gu & Dao 2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

“Diffusion Models Without Attention”, Yan et al 2023

Diffusion Models Without Attention

“Learning Few-Shot Imitation As Cultural Transmission”, Bhoopchand et al 2023

Learning few-shot imitation as cultural transmission

“HGRN: Hierarchically Gated Recurrent Neural Network for Sequence Modeling”, Qin et al 2023

HGRN: Hierarchically Gated Recurrent Neural Network for Sequence Modeling

“On Prefrontal Working Memory and Hippocampal Episodic Memory: Unifying Memories Stored in Weights and Activation Slots”, Whittington et al 2023

On prefrontal working memory and hippocampal episodic memory: Unifying memories stored in weights and activation slots

“GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling”, Katsch 2023

GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling

“ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models”, Luo et al 2023

ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

“Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study With Linear Models”, Fu et al 2023

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

“Generalization in Sensorimotor Networks Configured With Natural Language Instructions”, Riveland & Pouget 2023

Generalization in Sensorimotor Networks Configured with Natural Language Instructions

“Parallelizing Non-Linear Sequential Models over the Sequence Length”, Lim et al 2023

Parallelizing non-linear sequential models over the sequence length

“A High-Performance Neuroprosthesis for Speech Decoding and Avatar Control”, Metzger et al 2023

A high-performance neuroprosthesis for speech decoding and avatar control

“Retentive Network: A Successor to Transformer for Large Language Models”, Sun et al 2023

Retentive Network: A Successor to Transformer for Large Language Models

“Using Sequences of Life-Events to Predict Human Lives”, Savcisens et al 2023

Using Sequences of Life-events to Predict Human Lives

“Thought Cloning: Learning to Think While Acting by Imitating Human Thinking”, Hu & Clune 2023

Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

“RWKV: Reinventing RNNs for the Transformer Era”, Peng et al 2023

RWKV: Reinventing RNNs for the Transformer Era

“Model Scale versus Domain Knowledge in Statistical Forecasting of Chaotic Systems”, Gilpin 2023

Model scale versus domain knowledge in statistical forecasting of chaotic systems

“Resurrecting Recurrent Neural Networks for Long Sequences”, Orvieto et al 2023

Resurrecting Recurrent Neural Networks for Long Sequences

“SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, Zhu et al 2023

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

“Organic Reaction Mechanism Classification Using Machine Learning”, Burés & Larrosa 2023

Organic reaction mechanism classification using machine learning

“A High-Performance Speech Neuroprosthesis”, Willett et al 2023

A high-performance speech neuroprosthesis

“Hungry Hungry Hippos: Towards Language Modeling With State Space Models”, Fu et al 2022

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

“A 64-Core Mixed-Signal In-Memory Compute Chip Based on Phase-Change Memory for Deep Neural Network Inference”, Gallo et al 2022

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

“Melting Pot 2.0”, Agapiou et al 2022

Melting Pot 2.0

“VeLO: Training Versatile Learned Optimizers by Scaling Up”, Metz et al 2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

“Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Agarwal et al 2022

Legged Locomotion in Challenging Terrains using Egocentric Vision

“Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, Tjandra et al 2022

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

“Perfectly Secure Steganography Using Minimum Entropy Coupling”, Witt et al 2022

Perfectly Secure Steganography Using Minimum Entropy Coupling

“Transformers Learn Shortcuts to Automata”, Liu et al 2022

Transformers Learn Shortcuts to Automata

“Omnigrok: Grokking Beyond Algorithmic Data”, Liu et al 2022

Omnigrok: Grokking Beyond Algorithmic Data

“Semantic Scene Descriptions As an Objective of Human Vision”, Doerig et al 2022

Semantic scene descriptions as an objective of human vision

“Benchmarking Compositionality With Formal Languages”, Valvoda et al 2022

Benchmarking Compositionality with Formal Languages

“Learning to Generalize With Object-Centric Agents in the Open World Survival Game Crafter”, Stanić et al 2022

Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

“PI-ARS: Accelerating Evolution-Learned Visual-Locomotion With Predictive Information Representations”, Lee et al 2022

PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

“Spatial Representation by Ramping Activity of Neurons in the Retrohippocampal Cortex”, Tennant et al 2022

Spatial representation by ramping activity of neurons in the retrohippocampal cortex

“Neural Networks and the Chomsky Hierarchy”, Delétang et al 2022

Neural Networks and the Chomsky Hierarchy

“BYOL-Explore: Exploration by Bootstrapped Prediction”, Guo et al 2022

BYOL-Explore: Exploration by Bootstrapped Prediction

“AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos”, Wu et al 2022

AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos

“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, Caccia et al 2022

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)

“Simple Recurrence Improves Masked Language Models”, Lei et al 2022

Simple Recurrence Improves Masked Language Models

“Sequencer: Deep LSTM for Image Classification”, Tatsunami & Taki 2022

Sequencer: Deep LSTM for Image Classification

“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Chan et al 2022

Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers

“Semantic Projection Recovers Rich Human Knowledge of Multiple Object Features from Word Embeddings”, Grand et al 2022

Semantic projection recovers rich human knowledge of multiple object features from word embeddings

“Block-Recurrent Transformers”, Hutchins et al 2022

Block-Recurrent Transformers

“All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL”, Arulkumaran et al 2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

“Retrieval-Augmented Reinforcement Learning”, Goyal et al 2022

Retrieval-Augmented Reinforcement Learning

“Learning by Directional Gradient Descent”, Silver et al 2022

Learning by Directional Gradient Descent

“General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR”, Hawthorne et al 2022

General-purpose, long-context autoregressive modeling with Perceiver AR

“End-To-End Algorithm Synthesis With Recurrent Networks: Logical Extrapolation Without Overthinking”, Bansal et al 2022

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

“Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies”, Gklezakos & Rao 2022

Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies

“Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild”, Miki et al 2022

Learning robust perceptive locomotion for quadrupedal robots in the wild

“Inducing Causal Structure for Interpretable Neural Networks (IIT)”, Geiger et al 2021

Inducing Causal Structure for Interpretable Neural Networks (IIT)

“Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021

Evaluating Distributional Distortion in Neural Language Modeling

“Gradients Are Not All You Need”, Metz et al 2021

Gradients are Not All You Need

“An Explanation of In-Context Learning As Implicit Bayesian Inference”, Xie et al 2021

An Explanation of In-context Learning as Implicit Bayesian Inference

“S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Gu et al 2021

S4: Efficiently Modeling Long Sequences with Structured State Spaces

“Minimum Description Length Recurrent Neural Networks”, Lan et al 2021

Minimum Description Length Recurrent Neural Networks

“LSSL: Combining Recurrent, Convolutional, and Continuous-Time Models With Linear State-Space Layers”, Gu et al 2021

LSSL: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

“A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-Dependent Action Selection”, Hulse et al 2021

A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection

“Recurrent Model-Free RL Is a Strong Baseline for Many POMDPs”, Ni et al 2021

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

“Photos Are All You Need for Reciprocal Recommendation in Online Dating”, Neve & McConville 2021

Photos Are All You Need for Reciprocal Recommendation in Online Dating

“Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Jaegle et al 2021

Perceiver IO: A General Architecture for Structured Inputs & Outputs

“PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Vicol et al 2021

PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

“Shelley: A Crowd-Sourced Collaborative Horror Writer”, Delul et al 2021

Shelley: A Crowd-sourced Collaborative Horror Writer

“Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Jouppi et al 2021

Ten Lessons From Three Generations Shaped Google’s TPUv4i

“RASP: Thinking Like Transformers”, Weiss et al 2021

RASP: Thinking Like Transformers

“Scaling Laws for Acoustic Models”, Droppo & Elibol 2021

Scaling Laws for Acoustic Models

“Scaling End-To-End Models for Large-Scale Multilingual ASR”, Li et al 2021

Scaling End-to-End Models for Large-Scale Multilingual ASR

“Sensitivity As a Complexity Measure for Sequence Classification Tasks”, Hahn et al 2021

Sensitivity as a Complexity Measure for Sequence Classification Tasks

“ALD: Efficient Transformers in Reinforcement Learning Using Actor-Learner Distillation”, Parisotto & Salakhutdinov 2021

ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

“Finetuning Pretrained Transformers into RNNs”, Kasai et al 2021

Finetuning Pretrained Transformers into RNNs

“Pretrained Transformers As Universal Computation Engines”, Lu et al 2021

Pretrained Transformers as Universal Computation Engines

“Perceiver: General Perception With Iterative Attention”, Jaegle et al 2021

Perceiver: General Perception with Iterative Attention

“When Attention Meets Fast Recurrence: Training SRU++ Language Models With Reduced Compute”, Lei 2021

When Attention Meets Fast Recurrence: Training SRU++ Language Models with Reduced Compute

“Generative Speech Coding With Predictive Variance Regularization”, Kleijn et al 2021

Generative Speech Coding with Predictive Variance Regularization

“Predictive Coding Is a Consequence of Energy Efficiency in Recurrent Neural Networks”, Ali et al 2021

Predictive coding is a consequence of energy efficiency in recurrent neural networks

“Deep Residual Learning in Spiking Neural Networks”, Fang et al 2021

Deep Residual Learning in Spiking Neural Networks

“Distilling Large Language Models into Tiny and Effective Students Using PQRNN”, Kaliamoorthi et al 2021

Distilling Large Language Models into Tiny and Effective Students using pQRNN

“Meta Learning Backpropagation And Improving It”, Kirsch & Schmidhuber 2020

Meta Learning Backpropagation And Improving It

“On the Binding Problem in Artificial Neural Networks”, Greff et al 2020

On the Binding Problem in Artificial Neural Networks

“A Recurrent Vision-And-Language BERT for Navigation”, Hong et al 2020

A Recurrent Vision-and-Language BERT for Navigation

“Towards Playing Full MOBA Games With Deep Reinforcement Learning”, Ye et al 2020

Towards Playing Full MOBA Games with Deep Reinforcement Learning

“Multimodal Dynamics Modeling for Off-Road Autonomous Vehicles”, Tremblay et al 2020

Multimodal dynamics modeling for off-road autonomous vehicles

“Adversarial Vulnerabilities of Human Decision-Making”, Dezfouli et al 2020

Adversarial vulnerabilities of human decision-making

“Learning to Summarize Long Texts With Memory Compression and Transfer”, Park et al 2020

Learning to Summarize Long Texts with Memory Compression and Transfer

“Human-Centric Dialog Training via Offline Reinforcement Learning”, Jaques et al 2020

Human-centric Dialog Training via Offline Reinforcement Learning

“AFT: An Attention Free Transformer”, Anonymous 2020

AFT: An Attention Free Transformer

“Deep Reinforcement Learning for Closed-Loop Blood Glucose Control”, Fox et al 2020

Deep Reinforcement Learning for Closed-Loop Blood Glucose Control

“HiPPO: Recurrent Memory With Optimal Polynomial Projections”, Gu et al 2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections

“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Yoshida et al 2020

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

“Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, Scholl 2020

Matt Botvinick on the spontaneous emergence of learning algorithms

“Cultural Influences on Word Meanings Revealed through Large-Scale Semantic Alignment”, Thompson et al 2020

Cultural influences on word meanings revealed through large-scale semantic alignment

“DeepSinger: Singing Voice Synthesis With Data Mined From the Web”, Ren et al 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

“High-Performance Brain-To-Text Communication via Imagined Handwriting”, Willett et al 2020

High-performance brain-to-text communication via imagined handwriting

“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

“The Recurrent Neural Tangent Kernel”, Alemohammad et al 2020

The Recurrent Neural Tangent Kernel

“Untangling Tradeoffs between Recurrence and Self-Attention in Neural Networks”, Kerg et al 2020

Untangling tradeoffs between recurrence and self-attention in neural networks

“Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”, Dai et al 2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

“Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models”, Papadimitriou & Jurafsky 2020

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

“Syntactic Structure from Deep Learning”, Linzen & Baroni 2020

Syntactic Structure from Deep Learning

“Agent57: Outperforming the Human Atari Benchmark”, Puigdomènech et al 2020

Agent57: Outperforming the human Atari benchmark

“Machine Translation of Cortical Activity to Text With an Encoder-Decoder Framework”, Makin et al 2020

Machine translation of cortical activity to text with an encoder-decoder framework

“Learning-Based Memory Allocation for C++ Server Workloads”, Maas et al 2020

Learning-based Memory Allocation for C++ Server Workloads

“Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving”, Song et al 2020

Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving

“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020

Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

“Scaling Laws for Neural Language Models”, Kaplan et al 2020

Scaling Laws for Neural Language Models

“Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence”, Yang et al 2020

Estimating the deep replicability of scientific findings using human and artificial intelligence

“Placing Language in an Integrated Understanding System: Next Steps toward Human-Level Performance in Neural Language Models”, McClelland et al 2020

Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models

“Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”, Keysers et al 2019

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

“SimpleBooks: Long-Term Dependency Book Dataset With Simplified English Vocabulary for Word-Level Language Modeling”, Nguyen 2019

SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling

“Single Headed Attention RNN: Stop Thinking With Your Head”, Merity 2019

Single Headed Attention RNN: Stop Thinking With Your Head

“Excavate”, Lynch 2019

“MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model”, Schrittwieser et al 2019

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks”, Villegas et al 2019

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

“Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks”, Voelker et al 2019

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

“SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Espeholt et al 2019

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

“Mixed-Signal Neuromorphic Processors: Quo Vadis?”, Bavandpour et al 2019

Mixed-Signal Neuromorphic Processors: Quo vadis?

“Restoring Ancient Text Using Deep Learning (Pythia): a Case Study on Greek Epigraphy”, Assael et al 2019

Restoring ancient text using deep learning (Pythia): a case study on Greek epigraphy

“Mogrifier LSTM”, Melis et al 2019

Mogrifier LSTM

“R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems”, Paine et al 2019

R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

“Language Modelling State-Of-The-Art Leaderboards”, paperswithcode.com 2019

Language Modelling State-of-the-art leaderboards

“Metalearned Neural Memory”, Munkhdalai et al 2019

Metalearned Neural Memory

“Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, Socher et al 2019

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

“Generating Text With Recurrent Neural Networks”, Sutskever et al 2019

Generating Text with Recurrent Neural Networks

“XLNet: Generalized Autoregressive Pretraining for Language Understanding”, Yang et al 2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

“Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Yu et al 2019

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

“MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalizing Flows”, Henter et al 2019

MoGlow: Probabilistic and controllable motion synthesis using normalizing flows

“Reinforcement Learning, Fast and Slow”, Botvinick et al 2019

Reinforcement Learning, Fast and Slow

“Meta-Learners’ Learning Dynamics Are unlike Learners’”, Rabinowitz 2019

Meta-learners’ learning dynamics are unlike learners’

“Speech Synthesis from Neural Decoding of Spoken Sentences”, Anumanchipalli et al 2019

Speech synthesis from neural decoding of spoken sentences

“Good News, Everyone! Context Driven Entity-Aware Captioning for News Images”, Biten et al 2019

Good News, Everyone! Context driven entity-aware captioning for news images

“Surrogate Gradient Learning in Spiking Neural Networks”, Neftci et al 2019

Surrogate Gradient Learning in Spiking Neural Networks

“On the Turing Completeness of Modern Neural Network Architectures”, Pérez et al 2019

On the Turing Completeness of Modern Neural Network Architectures

“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

“Natural Questions: A Benchmark for Question Answering Research”, Kwiatkowski et al 2019

Natural Questions: A Benchmark for Question Answering Research

“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks: Videos”, Villegas et al 2019

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks: Videos

“Bayesian Layers: A Module for Neural Network Uncertainty”, Tran et al 2018

Bayesian Layers: A Module for Neural Network Uncertainty

“Meta-Learning: Learning to Learn Fast”, Weng 2018

Meta-Learning: Learning to Learn Fast

“Piano Genie”, Donahue et al 2018

Piano Genie

“Learning Recurrent Binary/Ternary Weights”, Ardakani et al 2018

Learning Recurrent Binary/Ternary Weights

“R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Kapturowski et al 2018

R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

“HotpotQA: A Dataset for Diverse, Explainable Multi-Hop Question Answering”, Yang et al 2018

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

“Adversarial Reprogramming of Text Classification Neural Networks”, Neekhara et al 2018

Adversarial Reprogramming of Text Classification Neural Networks

“Object Hallucination in Image Captioning”, Rohrbach et al 2018

Object Hallucination in Image Captioning

“This Time With Feeling: Learning Expressive Musical Performance”, Oore et al 2018

This Time with Feeling: Learning Expressive Musical Performance

“Character-Level Language Modeling With Deeper Self-Attention”, Al-Rfou et al 2018

Character-Level Language Modeling with Deeper Self-Attention

“General Value Function Networks”, Schlegel et al 2018

General Value Function Networks

“Deep-Speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, Lau et al 2018

Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme

“Universal Transformers”, Dehghani et al 2018

Universal Transformers

“Accurate Uncertainties for Deep Learning Using Calibrated Regression”, Kuleshov et al 2018

Accurate Uncertainties for Deep Learning Using Calibrated Regression

“The Natural Language Decathlon: Multitask Learning As Question Answering”, McCann et al 2018

The Natural Language Decathlon: Multitask Learning as Question Answering

“Neural Ordinary Differential Equations”, Chen et al 2018

Neural Ordinary Differential Equations

“Know What You Don’t Know: Unanswerable Questions for SQuAD”, Rajpurkar et al 2018

Know What You Don’t Know: Unanswerable Questions for SQuAD

“DVRL: Deep Variational Reinforcement Learning for POMDPs”, Igl et al 2018

DVRL: Deep Variational Reinforcement Learning for POMDPs

“Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”, Yang et al 2018

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

“Hierarchical Neural Story Generation”, Fan et al 2018

Hierarchical Neural Story Generation

“Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context”, Khandelwal et al 2018

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

“Newsroom: A Dataset of 1.3 Million Summaries With Diverse Extractive Strategies”, Grusky et al 2018

Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

“A Tree Search Algorithm for Sequence Labeling”, Lao et al 2018

A Tree Search Algorithm for Sequence Labeling

“An Analysis of Neural Language Modeling at Multiple Scales”, Merity et al 2018

An Analysis of Neural Language Modeling at Multiple Scales

“Reviving and Improving Recurrent Back-Propagation”, Liao et al 2018

Reviving and Improving Recurrent Back-Propagation

“Learning Memory Access Patterns”, Hashemi et al 2018

Learning Memory Access Patterns

“Learning Longer-Term Dependencies in RNNs With Auxiliary Losses”, Trinh et al 2018

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

“One Big Net For Everything”, Schmidhuber 2018

One Big Net For Everything

“Efficient Neural Audio Synthesis”, Kalchbrenner et al 2018

Efficient Neural Audio Synthesis

“Deep Contextualized Word Representations”, Peters et al 2018

Deep contextualized word representations

“M-Walk: Learning to Walk over Graphs Using Monte Carlo Tree Search”, Shen et al 2018

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

“Overcoming the Vanishing Gradient Problem in Plain Recurrent Networks”, Hu et al 2018

Overcoming the vanishing gradient problem in plain recurrent networks

“ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, Howard & Ruder 2018

ULMFiT: Universal Language Model Fine-tuning for Text Classification

“Large-Scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL”, Mayr et al 2018

Large-scale comparison of machine learning methods for drug target prediction on ChEMBL

“A Flexible Approach to Automated RNN Architecture Generation”, Schrimpf et al 2017

A Flexible Approach to Automated RNN Architecture Generation

“The NarrativeQA Reading Comprehension Challenge”, Kočiský et al 2017

The NarrativeQA Reading Comprehension Challenge

“Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition”, Ye et al 2017

Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

“Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent”, Yang et al 2017

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

“Evaluating Prose Style Transfer With the Bible”, Carlson et al 2017

Evaluating prose style transfer with the Bible

“Breaking the Softmax Bottleneck: A High-Rank RNN Language Model”, Yang et al 2017

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

“Neural Speed Reading via Skim-RNN”, Seo et al 2017

Neural Speed Reading via Skim-RNN

“Unsupervised Machine Translation Using Monolingual Corpora Only”, Lample et al 2017

Unsupervised Machine Translation Using Monolingual Corpora Only

“Generalization without Systematicity: On the Compositional Skills of Sequence-To-Sequence Recurrent Networks”, Lake & Baroni 2017

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

“Mixed Precision Training”, Micikevicius et al 2017

Mixed Precision Training

“To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression”, Zhu & Gupta 2017

To prune, or not to prune: exploring the efficacy of pruning for model compression

“Dynamic Evaluation of Neural Sequence Models”, Krause et al 2017

Dynamic Evaluation of Neural Sequence Models

“Online Learning of a Memory for Learning Rates”, Meier et al 2017

Online Learning of a Memory for Learning Rates

“Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification”, Shim et al 2017

Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification

“N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning”, Ashok et al 2017

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

“SRU: Simple Recurrent Units for Highly Parallelizable Recurrence”, Lei et al 2017

SRU: Simple Recurrent Units for Highly Parallelizable Recurrence

“Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks”, Jayaraman & Grauman 2017

Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

“Twin Networks: Matching the Future for Sequence Generation”, Serdyuk et al 2017

Twin Networks: Matching the Future for Sequence Generation

“Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, Campos et al 2017

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

“Revisiting Activation Regularization for Language RNNs”, Merity et al 2017

Revisiting Activation Regularization for Language RNNs

“Bayesian Sparsification of Recurrent Neural Networks”, Lobacheva et al 2017

Bayesian Sparsification of Recurrent Neural Networks

“On the State-Of-The-Art of Evaluation in Neural Language Models”, Melis et al 2017

On the State-of-the-Art of Evaluation in Neural Language Models

“Controlling Linguistic Style Aspects in Neural Language Generation”, Ficler & Goldberg 2017

Controlling Linguistic Style Aspects in Neural Language Generation

“Device Placement Optimization With Reinforcement Learning”, Mirhoseini et al 2017

Device Placement Optimization with Reinforcement Learning

“Towards Synthesizing Complex Programs from Input-Output Examples”, Chen et al 2017

Towards Synthesizing Complex Programs from Input-Output Examples

“Language Generation With Recurrent Generative Adversarial Networks without Pre-Training”, Press et al 2017

Language Generation with Recurrent Generative Adversarial Networks without Pre-training

“Biased Importance Sampling for Deep Neural Network Training”, Katharopoulos & Fleuret 2017

Biased Importance Sampling for Deep Neural Network Training

“Deriving Neural Architectures from Sequence and Graph Kernels”, Lei et al 2017

Deriving Neural Architectures from Sequence and Graph Kernels

“A Deep Reinforced Model for Abstractive Summarization”, Paulus et al 2017

A Deep Reinforced Model for Abstractive Summarization

“TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension”, Joshi et al 2017

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

“DeepTingle”, Khalifa et al 2017

“A Neural Network System for Transformation of Regional Cuisine Style”, Kazama et al 2017

A neural network system for transformation of regional cuisine style

“Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”, Devlin 2017

Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

“Adversarial Neural Machine Translation”, Wu et al 2017

Adversarial Neural Machine Translation

“SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Dunn et al 2017

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

“Learning to Reason: End-To-End Module Networks for Visual Question Answering”, Hu et al 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering

“Exploring Sparsity in Recurrent Neural Networks”, Narang et al 2017

Exploring Sparsity in Recurrent Neural Networks

“Get To The Point: Summarization With Pointer-Generator Networks”, See et al 2017

Get To The Point: Summarization with Pointer-Generator Networks

“DeepAR: Probabilistic Forecasting With Autoregressive Recurrent Networks”, Salinas et al 2017

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

“Bayesian Recurrent Neural Networks”, Fortunato et al 2017

Bayesian Recurrent Neural Networks

“Recurrent Environment Simulators”, Chiappa et al 2017

Recurrent Environment Simulators

“Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017

Learning to Generate Reviews and Discovering Sentiment

“Learning Simpler Language Models With the Differential State Framework”, II et al 2017

Learning Simpler Language Models with the Differential State Framework

“I2T2I: Learning Text to Image Synthesis With Textual Data Augmentation”, Dong et al 2017

I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

“Improving Neural Machine Translation With Conditional Sequence Generative Adversarial Nets”, Yang et al 2017

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

“Learned Optimizers That Scale and Generalize”, Wichrowska et al 2017

Learned Optimizers that Scale and Generalize

“Parallel Multiscale Autoregressive Density Estimation”, Reed et al 2017

Parallel Multiscale Autoregressive Density Estimation

“Tracking the World State With Recurrent Entity Networks”, Henaff et al 2017

Tracking the World State with Recurrent Entity Networks

“Optimization As a Model for Few-Shot Learning”, Ravi & Larochelle 2017

Optimization as a Model for Few-Shot Learning

“Neural Combinatorial Optimization With Reinforcement Learning”, Bello et al 2017

Neural Combinatorial Optimization with Reinforcement Learning

“Frustratingly Short Attention Spans in Neural Language Modeling”, Daniluk et al 2017

Frustratingly Short Attention Spans in Neural Language Modeling

“Tuning Recurrent Neural Networks With Reinforcement Learning”, Jaques et al 2017

Tuning Recurrent Neural Networks with Reinforcement Learning

“Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer”, Shazeer et al 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

“Neural Data Filter for Bootstrapping Stochastic Gradient Descent”, Fan et al 2017

Neural Data Filter for Bootstrapping Stochastic Gradient Descent

“Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization”, Paulus 2017

Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization

“SampleRNN: An Unconditional End-To-End Neural Audio Generation Model”, Mehri et al 2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

“Improving Neural Language Models With a Continuous Cache”, Grave et al 2016

Improving Neural Language Models with a Continuous Cache

“NewsQA: A Machine Comprehension Dataset”, Trischler et al 2016

NewsQA: A Machine Comprehension Dataset

“Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”, Johnson et al 2016

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

“Learning to Learn without Gradient Descent by Gradient Descent”, Chen et al 2016

Learning to Learn without Gradient Descent by Gradient Descent

“RL²: Fast Reinforcement Learning via Slow Reinforcement Learning”, Duan et al 2016

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning

“DeepCoder: Learning to Write Programs”, Balog et al 2016

DeepCoder: Learning to Write Programs

“QRNNs: Quasi-Recurrent Neural Networks”, Bradbury et al 2016

QRNNs: Quasi-Recurrent Neural Networks

“Neural Architecture Search With Reinforcement Learning”, Zoph & Le 2016

Neural Architecture Search with Reinforcement Learning

“Bidirectional Attention Flow for Machine Comprehension”, Seo et al 2016

Bidirectional Attention Flow for Machine Comprehension

“Hybrid Computing Using a Neural Network With Dynamic External Memory”, Graves et al 2016

Hybrid computing using a neural network with dynamic external memory

“Scaling Memory-Augmented Neural Networks With Sparse Reads and Writes”, Rae et al 2016

Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

“Using Fast Weights to Attend to the Recent Past”, Ba et al 2016

Using Fast Weights to Attend to the Recent Past

“Achieving Human Parity in Conversational Speech Recognition”, Xiong et al 2016

Achieving Human Parity in Conversational Speech Recognition

“VPN: Video Pixel Networks”, Kalchbrenner et al 2016

VPN: Video Pixel Networks

“HyperNetworks”, Ha et al 2016

HyperNetworks

“Pointer Sentinel Mixture Models”, Merity et al 2016

Pointer Sentinel Mixture Models

“Multiplicative LSTM for Sequence Modelling”, Krause et al 2016

Multiplicative LSTM for sequence modelling

“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Wu et al 2016

Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

“Image-To-Markup Generation With Coarse-To-Fine Attention”, Deng et al 2016

Image-to-Markup Generation with Coarse-to-Fine Attention

“Hierarchical Multiscale Recurrent Neural Networks”, Chung et al 2016

Hierarchical Multiscale Recurrent Neural Networks

“Deep Learning Human Mind for Automated Visual Classification”, Spampinato et al 2016

Deep Learning Human Mind for Automated Visual Classification

“Using the Output Embedding to Improve Language Models”, Press & Wolf 2016

Using the Output Embedding to Improve Language Models

“Full Resolution Image Compression With Recurrent Neural Networks”, Toderici et al 2016

Full Resolution Image Compression with Recurrent Neural Networks

“Decoupled Neural Interfaces Using Synthetic Gradients”, Jaderberg et al 2016

Decoupled Neural Interfaces using Synthetic Gradients

“Clockwork Convnets for Video Semantic Segmentation”, Shelhamer et al 2016

Clockwork Convnets for Video Semantic Segmentation

“Layer Normalization”, Ba et al 2016

Layer Normalization

“LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks”, Strobelt et al 2016

LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

“Learning to Learn by Gradient Descent by Gradient Descent”, Andrychowicz et al 2016

Learning to learn by gradient descent by gradient descent

“Iterative Alternating Neural Attention for Machine Reading”, Sordoni et al 2016

Iterative Alternating Neural Attention for Machine Reading

“Deep Reinforcement Learning for Dialogue Generation”, Li et al 2016

Deep Reinforcement Learning for Dialogue Generation

“Programming With a Differentiable Forth Interpreter”, Bošnjak et al 2016

Programming with a Differentiable Forth Interpreter

“Training Deep Nets With Sublinear Memory Cost”, Chen et al 2016

Training Deep Nets with Sublinear Memory Cost

“Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex”, Liao & Poggio 2016

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

“Improving Sentence Compression by Learning to Predict Gaze”, Klerke et al 2016

Improving sentence compression by learning to predict gaze

“Adaptive Computation Time for Recurrent Neural Networks”, Graves 2016

Adaptive Computation Time for Recurrent Neural Networks

“Dynamic Memory Networks for Visual and Textual Question Answering”, Xiong et al 2016

Dynamic Memory Networks for Visual and Textual Question Answering

“PlaNet—Photo Geolocation With Convolutional Neural Networks”, Weyand et al 2016

PlaNet—Photo Geolocation with Convolutional Neural Networks

“Learning Distributed Representations of Sentences from Unlabeled Data”, Hill et al 2016

Learning Distributed Representations of Sentences from Unlabeled Data

“Exploring the Limits of Language Modeling”, Jozefowicz et al 2016

Exploring the Limits of Language Modeling

“PixelRNN: Pixel Recurrent Neural Networks”, Oord et al 2016

PixelRNN: Pixel Recurrent Neural Networks

“Persistent RNNs: Stashing Recurrent Weights On-Chip”, Diamos et al 2016

Persistent RNNs: Stashing Recurrent Weights On-Chip

“Deep-Spying: Spying Using Smartwatch and Deep Learning”, Beltramelli & Risi 2015

Deep-Spying: Spying using Smartwatch and Deep Learning

“On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, Schmidhuber 2015

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

“Neural GPUs Learn Algorithms”, Kaiser & Sutskever 2015

Neural GPUs Learn Algorithms

“Sequence Level Training With Recurrent Neural Networks”, Ranzato et al 2015

Sequence Level Training with Recurrent Neural Networks

“Neural Programmer-Interpreters”, Reed & Freitas 2015

Neural Programmer-Interpreters

“Generating Sentences from a Continuous Space”, Bowman et al 2015

Generating Sentences from a Continuous Space

“Generative Concatenative Nets Jointly Learn to Write and Classify Reviews”, Lipton et al 2015

Generative Concatenative Nets Jointly Learn to Write and Classify Reviews

“Generating Images from Captions With Attention”, Mansimov et al 2015

Generating Images from Captions with Attention

“Semi-Supervised Sequence Learning”, Dai & Le 2015

Semi-supervised Sequence Learning

“BPEs: Neural Machine Translation of Rare Words With Subword Units”, Sennrich et al 2015

BPEs: Neural Machine Translation of Rare Words with Subword Units

“Training Recurrent Networks Online without Backtracking”, Ollivier et al 2015

Training recurrent networks online without backtracking

“Deep Recurrent Q-Learning for Partially Observable MDPs”, Hausknecht & Stone 2015

Deep Recurrent Q-Learning for Partially Observable MDPs

“Teaching Machines to Read and Comprehend”, Hermann et al 2015

Teaching Machines to Read and Comprehend

“Scheduled Sampling for Sequence Prediction With Recurrent Neural Networks”, Bengio et al 2015

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

“Visualizing and Understanding Recurrent Networks”, Karpathy et al 2015

Visualizing and Understanding Recurrent Networks

“The Unreasonable Effectiveness of Recurrent Neural Networks”, Karpathy 2015

The Unreasonable Effectiveness of Recurrent Neural Networks

“Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”, Bluche 2015

Deep Neural Networks for Large Vocabulary Handwritten Text Recognition

“Reinforcement Learning Neural Turing Machines—Revised”, Zaremba & Sutskever 2015

Reinforcement Learning Neural Turing Machines—Revised

“End-To-End Memory Networks”, Sukhbaatar et al 2015

End-To-End Memory Networks

“Inferring Algorithmic Patterns With Stack-Augmented Recurrent Nets”, Arm et al 2015

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

“DRAW: A Recurrent Neural Network For Image Generation”, Gregor et al 2015

DRAW: A Recurrent Neural Network For Image Generation

“Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews”, Mesnil et al 2014

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

“Neural Turing Machines”, Graves et al 2014

Neural Turing Machines

“Learning to Execute”, Zaremba & Sutskever 2014

Learning to Execute

“Neural Machine Translation by Jointly Learning to Align and Translate”, Bahdanau et al 2014

Neural Machine Translation by Jointly Learning to Align and Translate

“`doc2vec`: Distributed Representations of Sentences and Documents”, Le & Mikolov 2014

doc2vec: Distributed Representations of Sentences and Documents

“A Clockwork RNN”, Koutník et al 2014

A Clockwork RNN

“One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling”, Chelba et al 2013

One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

“Generating Sequences With Recurrent Neural Networks”, Graves 2013

Generating Sequences With Recurrent Neural Networks

“On the Difficulty of Training Recurrent Neural Networks”, Pascanu et al 2012

On the difficulty of training Recurrent Neural Networks

“Recurrent Neural Network Based Language Model”, Mikolov et al 2010

Recurrent Neural Network Based Language Model

“Large Language Models in Machine Translation”, Brants et al 2007

Large Language Models in Machine Translation

“Learning to Learn Using Gradient Descent”, Hochreiter et al 2001

Learning to Learn Using Gradient Descent

“Long Short-Term Memory”, Hochreiter & Schmidhuber 1997

Long Short-Term Memory

“Flat Minima”, Hochreiter & Schmidhuber 1997

Flat Minima

“Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity”, Williams & Zipser 1995

Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity

“A Focused Backpropagation Algorithm for Temporal Pattern Recognition”, Mozer 1995

A Focused Backpropagation Algorithm for Temporal Pattern Recognition

“Learning Complex, Extended Sequences Using the Principle of History Compression”, Schmidhuber 1992

Learning Complex, Extended Sequences Using the Principle of History Compression

“Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks”, Schmidhuber 1992

Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks

“Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks]”, Hochreiter 1991

Untersuchungen zu dynamischen neuronalen Netzen [Studies of dynamic neural networks]

“Finding Structure In Time”, Elman 1990

Finding Structure In Time

“Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical Report CU-CS–495–90]”, Mozer 1990

Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical report CU-CS–495–90]

“A Learning Algorithm for Continually Running Fully Recurrent Neural Networks”, Williams & Zipser 1989b

A Learning Algorithm for Continually Running Fully Recurrent Neural Networks

“Recurrent Backpropagation and Hopfield Networks”, Almeida & Neto 1989b

Recurrent Backpropagation and Hopfield Networks

“Backpropagation in Perceptrons With Feedback”, Almeida 1989

Backpropagation in Perceptrons with Feedback

“Experimental Analysis of the Real-Time Recurrent Learning Algorithm”, Williams & Zipser 1989

Experimental Analysis of the Real-time Recurrent Learning Algorithm

“A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks”, Schmidhuber 1989

A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks

“A Sticky-Bit Approach for Learning to Represent State”, Bachrach 1988

A Sticky-Bit Approach for Learning to Represent State

“Generalization of Backpropagation With Application to a Recurrent Gas Market Model”, Werbos 1988

Generalization of backpropagation with application to a recurrent gas market model

“Generalization of Back-Propagation to Recurrent Neural Networks”, Pineda 1987

Generalization of back-propagation to recurrent neural networks

“The Utility Driven Dynamic Error Propagation Network (RTRL)”, Robinson & Fallside 1987

The Utility Driven Dynamic Error Propagation Network (RTRL)

“A Self-Optimizing, Non-Symmetrical Neural Net for Content Addressable Memory and Pattern Recognition”, Lapedes & Farber 1986

A self-optimizing, non-symmetrical neural net for content addressable memory and pattern recognition

“Programming a Massively Parallel, Computation Universal System: Static Behavior”, Lapedes & Farber 1986b

Programming a massively parallel, computation universal system: Static behavior

“Serial Order: A Parallel Distributed Processing Approach”, Jordan 1986

Serial Order: A Parallel Distributed Processing Approach

“Attention and Augmented Recurrent Neural Networks”

Attention and Augmented Recurrent Neural Networks

“Deep Learning for Assisting the Process of Music Composition (part 3)”

Deep learning for assisting the process of music composition (part 3)

Wikipedia

Hopfield network
Jürgen Schmidhuber
Long short-term memory
Recurrent neural network⁠:

View External Link:

https://en.wikipedia.org/wiki/Recurrent_neural_network

Miscellaneous

Link Bibliography

https://arxiv.org/abs/2403.13802: “ZigMa: Zigzag Mamba Diffusion Model”, Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Bjorn Ommer

link-bibliography
https://arxiv.org/abs/2312.04927: “Zoology: Measuring and Improving Recall in Efficient Language Models”, Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2312.00752: “Mamba: Linear-Time Sequence Modeling With Selective State Spaces”, Albert Gu, Tri Dao

link-bibliography
https://www.nature.com/articles/s41467-023-42875-2#deepmind: “Learning Few-Shot Imitation As Cultural Transmission”, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett

link-bibliography
https://arxiv.org/abs/2306.00323: “Thought Cloning: Learning to Think While Acting by Imitating Human Thinking”, Shengran Hu, Jeff Clune

link-bibliography
https://arxiv.org/abs/2305.13048: “RWKV: Reinventing RNNs for the Transformer Era”, Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung

link-bibliography
https://arxiv.org/abs/2303.06349#deepmind: “Resurrecting Recurrent Neural Networks for Long Sequences”, Antonio Orvieto, Samuel L. Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

link-bibliography
https://arxiv.org/abs/2302.13939: “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian

link-bibliography
2023-bures.pdf: “Organic Reaction Mechanism Classification Using Machine Learning”, Jordi Burés, Igor Larrosa

link-bibliography
https://arxiv.org/abs/2212.14052: “Hungry Hungry Hippos: Towards Language Modeling With State Space Models”, Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2211.07638: “Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak

link-bibliography
https://arxiv.org/abs/2209.11737: “Semantic Scene Descriptions As an Objective of Human Vision”, Adrien Doerig, Tim C. Kietzmann, Emily Allen, Yihan Wu, Thomas Naselaris, Kendrick Kay, Ian Charest

link-bibliography
https://arxiv.org/abs/2205.01972: “Sequencer: Deep LSTM for Image Classification”, Yuki Tatsunami, Masato Taki

link-bibliography
2022-grand.pdf: “Semantic Projection Recovers Rich Human Knowledge of Multiple Object Features from Word Embeddings”, Gabriel Grand, Idan Asher Blank, Francisco Pereira, Evelina Fedorenko

link-bibliography
https://arxiv.org/abs/2203.07852: “Block-Recurrent Transformers”, DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

link-bibliography
https://arxiv.org/abs/2202.07765#deepmind: “General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR”, Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski

link-bibliography
2022-miki.pdf: “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild”, Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter

link-bibliography
https://arxiv.org/abs/2111.00396: “S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Albert Gu, Karan Goel, Christopher Ré

link-bibliography
https://elifesciences.org/articles/66039: “A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-Dependent Action Selection”, Brad K. Hulse, Hannah Haberkern, Romain Franconville, Daniel B. Turner-Evans, Shin-ya Takemura, Tanya Wolff

link-bibliography
https://arxiv.org/abs/2107.14795#deepmind: “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula

link-bibliography
https://proceedings.mlr.press/v139/vicol21a.html: “PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Paul Vicol, Luke Metz, Jascha Sohl-Dickstein

link-bibliography
2021-delul.pdf: “Shelley: A Crowd-Sourced Collaborative Horror Writer”, Pinar Yanardag Delul, Manuel Cebrian, Iyad Rahwan

link-bibliography
2021-jouppi.pdf: “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon

link-bibliography
https://arxiv.org/abs/2106.06981: “RASP: Thinking Like Transformers”, Gail Weiss, Yoav Goldberg, Eran Yahav

link-bibliography
https://arxiv.org/abs/2106.09488#amazon: “Scaling Laws for Acoustic Models”, Jasha Droppo, Oguz Elibol

link-bibliography
https://arxiv.org/abs/2103.03206#deepmind: “Perceiver: General Perception With Iterative Attention”, Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira

link-bibliography
https://arxiv.org/abs/2102.04159: “Deep Residual Learning in Spiking Neural Networks”, Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, Yonghong Tian

link-bibliography
https://arxiv.org/abs/2011.12692#tencent: “Towards Playing Full MOBA Games With Deep Reinforcement Learning”, Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu

link-bibliography
https://arxiv.org/abs/2008.07669: “HiPPO: Recurrent Memory With Optimal Polynomial Projections”, Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re

link-bibliography
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, Adam Scholl

link-bibliography
https://deepmind.google/discover/blog/agent57-outperforming-the-human-atari-benchmark/: “Agent57: Outperforming the Human Atari Benchmark”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell

link-bibliography
https://arxiv.org/abs/2002.03629: “Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving”, Yang Song, Chenlin Meng, Renjie Liao, Stefano Ermon

link-bibliography
https://arxiv.org/abs/2001.08361#openai: “Scaling Laws for Neural Language Models”, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford

link-bibliography
https://arxiv.org/abs/1911.11423: “Single Headed Attention RNN: Stop Thinking With Your Head”, Stephen Merity

link-bibliography
https://openreview.net/forum?id=HyxlRHBlUB: “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks”, Aaron R. Voelker, Ivana Kajić, Chris Eliasmith

link-bibliography
https://arxiv.org/abs/1910.06591#deepmind: “SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski

link-bibliography
https://arxiv.org/abs/1909.01792#deepmind: “Mogrifier LSTM”, Gábor Melis, Tomáš Kočiský, Phil Blunsom

link-bibliography
https://arxiv.org/abs/1905.01320#deepmind: “Meta-Learners’ Learning Dynamics Are unlike Learners’”, Neil C. Rabinowitz

link-bibliography
https://openreview.net/forum?id=r1lyTjAqYX#deepmind: “R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney

link-bibliography
https://arxiv.org/abs/1801.06146: “ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, Jeremy Howard, Sebastian Ruder

link-bibliography
https://arxiv.org/abs/1709.02755: “SRU: Simple Recurrent Units for Highly Parallelizable Recurrence”, Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi

link-bibliography
https://arxiv.org/abs/1704.05179: “SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho

link-bibliography
https://arxiv.org/abs/1704.05526: “Learning to Reason: End-To-End Module Networks for Visual Question Answering”, Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko

link-bibliography
https://arxiv.org/abs/1608.03609: “Clockwork Convnets for Video Semantic Segmentation”, Evan Shelhamer, Kate Rakelly, Judy Hoffman, Trevor Darrell

link-bibliography
2010-mikolov.pdf: “Recurrent Neural Network Based Language Model”, Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur

link-bibliography
1991-hochreiter.pdf: “Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks]”, Sepp Hochreiter

link-bibliography
1989-williams.pdf: “Experimental Analysis of the Real-Time Recurrent Learning Algorithm”, Ronald J. Williams, David Zipser

link-bibliography
1988-werbos.pdf: “Generalization of Backpropagation With Application to a Recurrent Gas Market Model”, Paul Joseph Werbos

link-bibliography
1987-pineda.pdf: “Generalization of Back-Propagation to Recurrent Neural Networks”, Fernando J. Pineda

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]