Max Planck Institute for Dynamics and Self-Organization -- Department for Nonlinear Dynamics and Network Dynamics Group
Personal tools
Log in

BCCN AG-Seminar

Tuesday, 01.02.2011 17 c.t.

Learning from dopamine: generating and exploiting a biological temporal-difference error signal

by Prof. Dr. Abigail Morrison
from Bernstein Center Freiburg

Contact person: Fred Wolf

Location

Seminarraum Haus 2, 4. Stock (Bunsenstr.)

Abstract

An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. In this talk, I present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. The neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal modulating excitatory synapses enables TD learning when learning is driven by positive rewards but not when driven by negative rewards.

Remarks