Mar 28, 2019 7:00 AM

Tracking Readers’ Eye Movements Can Help Computers Learn

As we read, our eyes reveal what words go together, and which are the most important. Researchers are applying that data to help neural networks understand language.

woman working at computer — Getty Images

The AI Database →

Application

Text analysis

Sector

Research

Source Data

Sensors

Text

Technology

Neural Network

For our eyes, reading is hardly a smooth ride. They stutter across the page, lingering over words that surprise or confuse, hopping over those that seem obvious in context (you can blame that for your typos), pupils widening when a word sparks a potent emotion. All this commotion is barely noticeable, occurring in milliseconds. But for psychologists who study how our minds process language, our unsteady eyes are a window into the black box of our brains.

Nora Hollenstein, a graduate student at ETH Zurich, thinks our reader’s gaze could be useful for another task: helping computers learn to read. Researchers are often looking for ways to make artificial neural networks more brainlike, but brain waves are noisy and poorly understood. So Hollenstein looks to gaze as a proxy. Last year she developed a dataset that combines eye tracking and brain signals gathered from EEG scans, hoping to discover patterns that can improve how neural networks understand language. “We wondered if giving it a bit more humanness would give us better results,” Hollenstein says.

Neural networks have produced immense improvements in how machines understand language, but in many cases they rely on large amounts of meticulously labeled data. That requires time and human labor; it also produces machines that are black boxes, and often seem to lack common sense. So researchers look for ways to give neural networks a nudge in the right direction by encoding rules and intuitions. In this case, Hollenstein tested whether data gleaned from the physical act of reading could help a neural network work better.

Last fall, Hollenstein and collaborators at the University of Copenhagen used her dataset to guide a neural network to the most important parts of a sentence it was trying to understand. In deep learning, researchers often rely on so-called attention mechanisms to do this, but they require large amounts of data to work well. By adding data around how long our eyes linger on a word, the researchers helped the neural networks focus on critical parts of a sentence as a human would. Gaze, the researchers found, was useful for a range of tasks, including identifying hate speech and detecting grammatical errors. In subsequent work Hollenstein found that adding more information about gaze, such as when eyes flit between words to confirm a relationship, helped a neural network better identify entities, like places and people.

The hope, Hollenstein says, is that gaze data could help reduce the manual labeling required to use machine learning in rare languages, and in tasks where labeled data is especially limited, like generating summaries of text. Ideally, she adds, gaze would be just the starting point, eventually complemented by the EEG data she gathered as researchers find more relevant signals in the noise of brain activity.

“The fact that the signals are there is I think clear to everyone,” says Dan Roth, a professor of computer science at the University of Pennsylvania. The trend in AI of using ever-increasing quantities of labeled data isn’t sustainable, he argues, and using human signals like gaze, he says, is an intriguing way to make machines a little more intuitive.

Still, eye tracking is unlikely to change how computer scientists build their algorithms, says Jacob Andreas, a researcher at Microsoft-owned Semantic Machines. Most of the manual text labeling that researchers depend on can be done fast and cheaply, via crowdsourcing platforms like Amazon’s Mechanical Turk. Gaze data is difficult to gather, requiring specialized lab equipment that needs constant recalibration, and EEGs are messier still, involving sticky probes that need to be wet every 30 minutes. (Even with all that effort, the signal is still fuzzy; it’s much better to place the probes under the skull.) But Hollenstein sees improvements on the horizon, with better webcams and smartphone cameras, for example, that could passively collect eye-tracking data as participants read in the leisure of their homes.

In any case, some of what they learn by improving machines might help us understand that other black box, our brains. As Andreas notes, researchers are constantly scouring neural networks for signs that they make use of humanlike intuitions---rather than relying solely on pattern matching based on reams of data. By observing what aspects of eye tracking and EEG signals improve the performance of a neural network, researchers might begin to shed light on what our brain signals mean. A neural network could become a kind of model organism for the human mind.