Max Planck Institute for Dynamics and Self-Organization -- Department for Nonlinear Dynamics and Network Dynamics Group
Personal tools
Log in


Tuesday, 05.02.2013 17 c.t.

Is there "value" in reinforcement learning?

by Dr. Yonatan Loewenstein
from Department of Neurobiology, The Hebrew University, Jerusalem, Israel

Contact person: Fred Wolf


Ludwig Prandtl lecture hall


Behaviors that are followed by a reward are more likely to be repeated in the future, a phenomenon known as the “law of effect”. I will discuss two quantitative computational accounts of this law of behavior. The first assumes that the agent maintains a set of estimates of the expectation values of accumulated future rewards associated with the different states of the world or the different state-action pairs, and the decision at a state of the world depends on these values. Learning in this framework results from an on-line update of the values according to the actions and their consequences. I will show that in a repeated-choice setting, this framework provides a good quantitative description of human behavior if we assume that first experience resets the initial estimates values of the actions. In particular, I will focus on primacy and risk aversion. The second account of the law of effect posits that covariance-based synaptic plasticity underlies operant learning. I will show that in free operant setting, this covariance-based synaptic plasticity provides a good quantitative description of animal behavior and is consistent with the fast adaptation to matching behavior. I will conclude by contrasting the two approaches.

back to overview