Developments in artificial intelligence often draw inspiration from how people think, but now AI has reversed the roles of teaching us about how brains learn.
Will Dabney of technology company DeepMind in London and his colleagues have discovered that a recent development in machine learning, called distribution enhancement, also provides a new explanation of how the reward pathways in the brain work. These pathways determine our response to pleasant events and are mediated by neurons that release the chemical dopamine in the brain.
“Dopamine in the brain is a kind of surprise signal,” Dabney says. “If it goes better than expected, more dopamine will be released.”
It was previously thought that these dopamine neurons all reacted identically. “Kind of a choir, but where everyone sings exactly the same note,” Dabney says.
But the team found that individual dopamine neurons actually seem to vary – each is tuned to a different level of optimism or pessimism.
“They all give signals at different levels of surprise,” Dabney says. “More like a choir that sings all different notes and harmonises together.”
The finding was inspired by a process known as distribution enhancement learning, which is one of the techniques used by AI to master games such as Go and Starcraft II.
At its simplest, reinforcement learning is the idea that a reward reinforces the behavior that led to the acquisition. It requires an understanding of how a current promotion leads to a future reward. For example, a dog can learn the “sit” command because he is rewarded with a treat when he does.
Previously, models of reinforcement learning in both AI and neurosciences focused on learning to predict an “average” future reward. “But this does not reflect the reality as we experience it,” says Dabney.
“For example, if someone plays the lottery, he expects to win or lose, but he doesn't expect it to be halfway through the average outcome that doesn't necessarily occur,” he says.
If the future is uncertain, the possible outcomes can be represented as a probability distribution: some are positive, others negative. AIs that use distribution enhancing learning algorithms are able to predict the full spectrum of potential rewards.
To test whether the dopamine reward pathways in the brain also work through a distribution, the team recorded responses from individual dopamine neurons in mice. The mice were trained to perform a task and received rewards of various and unpredictable sizes.
The researchers found that different dopamine cells reliably showed different levels of surprise.
“Linking rewards to specific incentives or actions is crucial for survival,” says Raul Vicente of the University of Tartu, Estonia. “The brain cannot afford to throw away valuable information about rewards.
“The study is broadly in line with the current view that in order to work efficiently, the brain must not only represent the average value of a variable, but also how often a variable takes different values,” says Vicente. “It is a good example of how calculation algorithms can guide us in what to look for in neural responses.
However, Vicente adds, more research is needed to show whether the results apply to other types or regions of the brain.
Source: https://www.newscientist.com/article/2230327-deepmind-found-an-ai-learning-technique-also-works-in-human-brains/#ixzz6BG8VXx9nTags: #ArtificialIntelligence, #latestNewsAI, 7 stappen, Artificial Intelligence, samsung, starten met AI, virtuele mens