- Peter Dayan – Director of the Gatsby Computational Neuroscience Unit, University College of London
- Ray Dolan – Director of the Max Planck Centre for Computational Psychiatry and Ageing
- Wolfram Schultz – Professor of Neuroscience and Wellcome Trust Principal Research Fellow at the University of Cambridge
Collectively, their work examines the ability of humans and animals to link rewards to events and actions. This capacity has been a foundation of our survival, but can also be the root of many neurological and psychiatric disorders, such as addiction, compulsive behaviour and schizophrenia. In order for the successful survival and reproduction of a species, an animal must be able to make decisions that avoid danger and bring benefits (such as food, shelter, etc.). T decision-making requires predicting outcomes from environmental clues and previously learned responses. For instance, certain smells may indicate that an animal should prepare to chase prey, or to avoid a fruit item. The brain plays a key role in this decision making and learning, and at the centre of this is the neurotransmitter dopamine.
In the 1980s, Professor Wolfram Schultz developed a way of recording the activity of neurons in the brain that use dopamine to transmit information. He found that the dopamine neurons would respond whenever a monkey was given fruit juice reward. Schultz then showed the animals different visual patterns; whenever a certain pattern was shown, the monkey would receive a reward. After a time the dopamine neurons began to respond to the visual pattern, rather than the juice reward (response to the juice reward itself declined over time). Conversely, when no reward was given (after the correct pattern was shown), the dopamine neuron activity decreased below normal levels. If the reward was given at another time or was bigger than expected, the dopamine neuron activity would spike (1). This was the first clear demonstration of the neurological basis of one cornerstone of learning theory in Comparative and Behavioural Psychology; Pavlovian conditioning (2).
Building on Schultz’s work, Peter Dayan found the pattern of activity from dopamine neurons described by Schultz resembled the ‘reward prediction error’. This signal is the difference between predicted and actual reward resulting from an action or event. It continuously updates according to the result of new events and outcomes. Dayan would go on to work with Schultz to create computational models investigating how the brain uses information to make predictions and how this information is updated when new or contrasting information is presented.
Schultz explains the reward prediction error and resulting learning in the following analogy:
I am standing in front of a drink-dispensing machine in Japan that seems to allow me to buy six different types of drinks, but I cannot read the words. I have a low expectation that pressing a particular button will deliver my preferred blackcurrant juice (a chance of one in six). So I just press the second button from the right, and then a blue can appears with a familiar logo that happens to be exactly the drink I want. That is a pleasant surprise, better than expected. What would I do the next time I want the same blackcurrant juice from the machine? Of course, press the second button from the right. Thus, my surprise directs my behavior to a specific button. I have learned something, and I will keep pressing the same button as long as the same can comes out. However, a couple of weeks later, I press that same button again, but another, less preferred can appears. Unpleasant surprise, somebody must have filled the dispenser differently. Where is my preferred can? I press another couple of buttons until my blue can comes out. And of course I will press that button again the next time I want that blackcurrant juice, and hopefully all will go well.
What happened? The first button press delivered my preferred can. This pleasant surprise is what we call a positive reward prediction error. “Error” refers to the difference between the can that came out and the low expectation of getting exactly that one, irrespective of whether I made an error or something else went wrong. “Reward” is any object or stimulus that I like and of which I want more. “Reward prediction error” then means the difference between the reward I get and the reward that was predicted. Numerically, the prediction error on my first press was 1 minus 1/6, the difference between what I got and what I reasonably expected. Once I get the same can again and again for the same button press, I get no more surprises; there is no prediction error, I don’t change my behavior, and thus I learn nothing more about these buttons. But what about the wrong can coming out 2 weeks later? I had the firm expectation of my preferred blackcurrant juice but, unpleasant surprise, the can that came out was not the one I preferred. I experienced a negative prediction error, the difference between the nonpreferred, lower valued can and the expected preferred can. At the end of the exercise, I have learned where to get my preferred blackcurrant juice, and the prediction errors helped me to learn where to find it.
Professor Ray Dolan’s work has involved imaging the human brain in order to understand the mechanisms for learning and decision-making. Advancing the work of Schultz and Dayan, he showed that the reward prediction error can account for how humans learn, and the role that dopamine plays within it. He has collaborated with Dayan for the past decade to investigate human motivation, variations in happiness, and human gambling behaviour.
Schultz continues to study both animals and humans, using neuroimaging to study changes in neuron signals in Parkinson’s patients, smokers and drug addicts. The more we understand the process which leads people to take certain actions, the better positioned we are to intervene.
Professor Sir Colin Blakemore (University of London), chairman of the Brain Prize selection committee said,
“The judges concluded that the discoveries made by Wolfram Schultz, Peter Dayan and Ray Dolan were crucial for understanding how the brain detects reward and uses this information to guide behaviour. This work is a wonderful example of the creative power of interdisciplinary research, bringing together computational explanations of the role of activity in the monkey brain with advanced brain imaging in human beings to illuminate the way in which we use reward to regulate our choices and actions. The implications of these discoveries are extremely wide-ranging, in fields as diverse as economics, social science, drug addiction and psychiatry”.
Speaking of Research
- Schultz, W., 2015, Neuronal Reward and Decision Signals: From Theories to Data, Physiol Rev 95(3)
- Schultz, W. et al, 1993, Responses of Monkey Dopamine Neurons to Reward and Conditioned Stimuli during Successive Steps of Learning a Delayed Response Task, Journal of Neuroscience 13(3)