Brian Skyrms
- Published in print:
- 2010
- Published Online:
- May 2010
- ISBN:
- 9780199580828
- eISBN:
- 9780191722769
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199580828.003.0008
- Subject:
- Philosophy, Philosophy of Language, Philosophy of Science
This chapter argues that investigation of reinforcement learning is a complement to the study of belief learning, rather than being a ‘dangerous antagonist’. It begins at the low end of the scale, to ...
More
This chapter argues that investigation of reinforcement learning is a complement to the study of belief learning, rather than being a ‘dangerous antagonist’. It begins at the low end of the scale, to see how far simple reinforcement learning can get us, and then move up. Exactly how does degree of reinforcement affect the strengthening of the bond between stimulus and response? Different answers are possible, and these yield alternative theories of the law of effect.Less
This chapter argues that investigation of reinforcement learning is a complement to the study of belief learning, rather than being a ‘dangerous antagonist’. It begins at the low end of the scale, to see how far simple reinforcement learning can get us, and then move up. Exactly how does degree of reinforcement affect the strengthening of the bond between stimulus and response? Different answers are possible, and these yield alternative theories of the law of effect.
John R. Anderson
- Published in print:
- 2007
- Published Online:
- September 2007
- ISBN:
- 9780195324259
- eISBN:
- 9780199786671
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195324259.003.0004
- Subject:
- Psychology, Cognitive Models and Architectures
The entire cerebral cortex projects down to the basal ganglia that plays an important role in coordinating cognition. These structures serve as the repository for our procedural knowledge. They have ...
More
The entire cerebral cortex projects down to the basal ganglia that plays an important role in coordinating cognition. These structures serve as the repository for our procedural knowledge. They have the ability to recognize appropriate cortical patterns and take actions directly without further deliberation. In contrast to declarative memory, procedural memory is a slow-learning system in which new capacities only gradually emerge. This chapter describes how new production rules are acquired in procedural memory and how reinforcement learning mechanisms serve to select among alternative productions. Three examples are described that focus on procedural learning in language acquisition, learning from instructions, and brain imaging changes that occur with procedural learning.Less
The entire cerebral cortex projects down to the basal ganglia that plays an important role in coordinating cognition. These structures serve as the repository for our procedural knowledge. They have the ability to recognize appropriate cortical patterns and take actions directly without further deliberation. In contrast to declarative memory, procedural memory is a slow-learning system in which new capacities only gradually emerge. This chapter describes how new production rules are acquired in procedural memory and how reinforcement learning mechanisms serve to select among alternative productions. Three examples are described that focus on procedural learning in language acquisition, learning from instructions, and brain imaging changes that occur with procedural learning.
Nathaniel D. Daw
- Published in print:
- 2012
- Published Online:
- May 2016
- ISBN:
- 9780262018098
- eISBN:
- 9780262306003
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262018098.003.0012
- Subject:
- Sociology, Social Psychology and Interaction
One oft-envisioned function of search is planning actions (e.g., by exploring routes through a cognitive map). Yet, among the most prominent and quantitatively successful neuroscentific theories of ...
More
One oft-envisioned function of search is planning actions (e.g., by exploring routes through a cognitive map). Yet, among the most prominent and quantitatively successful neuroscentific theories of the brain’s systems for action choice is the temporal-difference account of the phasic dopamine response. Surprisingly, this theory envisions that action sequences are learned without any search at all, but instead wholly through a process of reinforcement and chaining. This chapter considers recent proposals that a related family of algorithms, called model-based reinforcement learning, may provide a similarly quantitative account for action choice by cognitive search. It reviews behavioral phenomena demonstrating the insufficiency of temporal-difference-like mechanisms alone, then details the many questions that arise in considering how model-based action valuation might be implemented in the brain and in what respects it differs from other ideas about search for planning.Less
One oft-envisioned function of search is planning actions (e.g., by exploring routes through a cognitive map). Yet, among the most prominent and quantitatively successful neuroscentific theories of the brain’s systems for action choice is the temporal-difference account of the phasic dopamine response. Surprisingly, this theory envisions that action sequences are learned without any search at all, but instead wholly through a process of reinforcement and chaining. This chapter considers recent proposals that a related family of algorithms, called model-based reinforcement learning, may provide a similarly quantitative account for action choice by cognitive search. It reviews behavioral phenomena demonstrating the insufficiency of temporal-difference-like mechanisms alone, then details the many questions that arise in considering how model-based action valuation might be implemented in the brain and in what respects it differs from other ideas about search for planning.
Daniel Durstewitz
- Published in print:
- 2009
- Published Online:
- February 2010
- ISBN:
- 9780195373035
- eISBN:
- 9780199865543
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195373035.003.0018
- Subject:
- Neuroscience, Molecular and Cellular Systems, History of Neuroscience
Current computational models of dopamine (DA) modulation have worked either from a more abstract neuroalgorithmic level, starting with specific assumptions about DA's computational role and then ...
More
Current computational models of dopamine (DA) modulation have worked either from a more abstract neuroalgorithmic level, starting with specific assumptions about DA's computational role and then working out its implications at a higher cognitive level, or have used a more biophysical/physiological implementation to unravel the dynamic and functional consequences of DA's effects on voltage-gated and synaptic ion channels. This chapter focuses on the latter, and in addition will specifically review models of DA-innervated target regions rather than models of ventral tegmental area/substantia nigra (VTA/SN) DA neurons themselves. It begins with a brief discussion of how DA may change the input/output functions of single striatal and cortical neuron. It considers the network level and the potential computational role of DA in higher cognitive functions, and then reviews DA-based models of reinforcement learning.Less
Current computational models of dopamine (DA) modulation have worked either from a more abstract neuroalgorithmic level, starting with specific assumptions about DA's computational role and then working out its implications at a higher cognitive level, or have used a more biophysical/physiological implementation to unravel the dynamic and functional consequences of DA's effects on voltage-gated and synaptic ion channels. This chapter focuses on the latter, and in addition will specifically review models of DA-innervated target regions rather than models of ventral tegmental area/substantia nigra (VTA/SN) DA neurons themselves. It begins with a brief discussion of how DA may change the input/output functions of single striatal and cortical neuron. It considers the network level and the potential computational role of DA in higher cognitive functions, and then reviews DA-based models of reinforcement learning.
John P O'Doherty
- Published in print:
- 2009
- Published Online:
- March 2012
- ISBN:
- 9780199217298
- eISBN:
- 9780191696077
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199217298.003.0004
- Subject:
- Psychology, Cognitive Models and Architectures
This chapter discusses evidence for the applicability of reinforcement learning models to reward-learning and reward-based action selection in humans, with a particular emphasis on data derived from ...
More
This chapter discusses evidence for the applicability of reinforcement learning models to reward-learning and reward-based action selection in humans, with a particular emphasis on data derived from functional magnetic resonance imaging (fMRI) studies. It begins with a description of basic mechanisms by which predictions of future reward as well as punishment can be learned, and their associated neural bases. It then considers the mechanisms underlying action selection for reward, as well as in learning to avoid punishers. It also reviews instances where findings from functional neuroimaging studies are inconsistent with a simple reinforcement learning framework, and discusses the implications of these results for the development of a more complete theory of human choice.Less
This chapter discusses evidence for the applicability of reinforcement learning models to reward-learning and reward-based action selection in humans, with a particular emphasis on data derived from functional magnetic resonance imaging (fMRI) studies. It begins with a description of basic mechanisms by which predictions of future reward as well as punishment can be learned, and their associated neural bases. It then considers the mechanisms underlying action selection for reward, as well as in learning to avoid punishers. It also reviews instances where findings from functional neuroimaging studies are inconsistent with a simple reinforcement learning framework, and discusses the implications of these results for the development of a more complete theory of human choice.
Paul W. Glimcher
- Published in print:
- 2010
- Published Online:
- January 2011
- ISBN:
- 9780199744251
- eISBN:
- 9780199863433
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199744251.003.0013
- Subject:
- Psychology, Neuropsychology
This chapter reviews the anatomy of the neural circuits associated with dopamine and valuation; presents theories of reinforcement learning from psychology and computer science that predate ...
More
This chapter reviews the anatomy of the neural circuits associated with dopamine and valuation; presents theories of reinforcement learning from psychology and computer science that predate dopaminergic studies of valuation; examines experimental work that forged a relationship between dopamine and these pre-existing computational theories; and reductively follows these insights all the way down to the behavior of ion channels. It shows that a neuroeconomic theory of valuation sweeps effectively from cell membranes to utility theory, and that constraints derived from each interlocking level of analysis provide useful tools for better understanding how we decide.Less
This chapter reviews the anatomy of the neural circuits associated with dopamine and valuation; presents theories of reinforcement learning from psychology and computer science that predate dopaminergic studies of valuation; examines experimental work that forged a relationship between dopamine and these pre-existing computational theories; and reductively follows these insights all the way down to the behavior of ion channels. It shows that a neuroeconomic theory of valuation sweeps effectively from cell membranes to utility theory, and that constraints derived from each interlocking level of analysis provide useful tools for better understanding how we decide.
H. Peyton Young
- Published in print:
- 2004
- Published Online:
- October 2011
- ISBN:
- 9780199269181
- eISBN:
- 9780191699375
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199269181.003.0002
- Subject:
- Economics and Finance, Econometrics
Reinforcement is an empirical principle which states that the higher the payoff from taking an action in the past, the more likely it will be taken in the future. This approach has long occupied a ...
More
Reinforcement is an empirical principle which states that the higher the payoff from taking an action in the past, the more likely it will be taken in the future. This approach has long occupied a prominent place in psychology; more recently it has begun to migrate into experimental economics and game theory. This chapter sketches some of the better-known models and what is known about their asymptotic behaviour in particular situations.Less
Reinforcement is an empirical principle which states that the higher the payoff from taking an action in the past, the more likely it will be taken in the future. This approach has long occupied a prominent place in psychology; more recently it has begun to migrate into experimental economics and game theory. This chapter sketches some of the better-known models and what is known about their asymptotic behaviour in particular situations.
Jörg Rieskamp and Philipp E. Otto
- Published in print:
- 2011
- Published Online:
- May 2011
- ISBN:
- 9780199744282
- eISBN:
- 9780199894727
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199744282.003.0011
- Subject:
- Psychology, Cognitive Psychology, Human-Technology Interaction
The assumption that people possess a repertoire of strategies to solve the inference problems they face has been raised repeatedly. However, a computational model specifying how people select ...
More
The assumption that people possess a repertoire of strategies to solve the inference problems they face has been raised repeatedly. However, a computational model specifying how people select strategies from their repertoire is still lacking. The proposed strategy selection learning (SSL) theory predicts a strategy selection process on the basis of reinforcement learning. The theory assumes that individuals develop subjective expectations for the strategies they have and select strategies proportional to their expectations, which are then updated on the basis of subsequent experience. The learning assumption was supported in four experimental studies. Participants substantially improved their inferences through feedback. In all four studies, the best-performing strategy from the participants' repertoires most accurately predicted the inferences after sufficient learning opportunities. When testing SSL against three models representing extensions of SSL and against an exemplar model assuming a memory-based inference process, the authors found that SSL predicted the inferences most accurately.Less
The assumption that people possess a repertoire of strategies to solve the inference problems they face has been raised repeatedly. However, a computational model specifying how people select strategies from their repertoire is still lacking. The proposed strategy selection learning (SSL) theory predicts a strategy selection process on the basis of reinforcement learning. The theory assumes that individuals develop subjective expectations for the strategies they have and select strategies proportional to their expectations, which are then updated on the basis of subsequent experience. The learning assumption was supported in four experimental studies. Participants substantially improved their inferences through feedback. In all four studies, the best-performing strategy from the participants' repertoires most accurately predicted the inferences after sufficient learning opportunities. When testing SSL against three models representing extensions of SSL and against an exemplar model assuming a memory-based inference process, the authors found that SSL predicted the inferences most accurately.
Edmund T. Rolls
- Published in print:
- 2007
- Published Online:
- September 2009
- ISBN:
- 9780199232703
- eISBN:
- 9780191724046
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199232703.003.0010
- Subject:
- Neuroscience, Behavioral Neuroscience
This chapter looks at selection of mainly autonomic responses and their classical conditioning. Selection of approach or withdrawal, and their classical conditioning are also mentioned. It then goes ...
More
This chapter looks at selection of mainly autonomic responses and their classical conditioning. Selection of approach or withdrawal, and their classical conditioning are also mentioned. It then goes on to describe a selection of fixed stimulus-response habits; and a selection of arbitrary behaviours to obtain goals, action-outcome learning, and emotional learning. The roles of the prefrontal cortex in decision-making and attention are described. The chapter then goes on to talk about neuroeconomics, reward magnitude, expected value, and expected utility; delay of reward, emotional choice, and rational choice; reward prediction error, temporal difference error, and choice; reciprocal altruism, strong reciprocity, generosity, and altruistic punishment; and dual routes to action, and decision-making.Less
This chapter looks at selection of mainly autonomic responses and their classical conditioning. Selection of approach or withdrawal, and their classical conditioning are also mentioned. It then goes on to describe a selection of fixed stimulus-response habits; and a selection of arbitrary behaviours to obtain goals, action-outcome learning, and emotional learning. The roles of the prefrontal cortex in decision-making and attention are described. The chapter then goes on to talk about neuroeconomics, reward magnitude, expected value, and expected utility; delay of reward, emotional choice, and rational choice; reward prediction error, temporal difference error, and choice; reciprocal altruism, strong reciprocity, generosity, and altruistic punishment; and dual routes to action, and decision-making.
Alex Kacelnik
- Published in print:
- 2012
- Published Online:
- May 2016
- ISBN:
- 9780262018081
- eISBN:
- 9780262306027
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262018081.003.0002
- Subject:
- Psychology, Social Psychology
This chapter contrasts two approaches to the study of mechanism and function in decision making: rules of thumbs (or heuristics) and the contributions of experimental psychology and psychophysics. ...
More
This chapter contrasts two approaches to the study of mechanism and function in decision making: rules of thumbs (or heuristics) and the contributions of experimental psychology and psychophysics. The first approach is most frequently used by behavioral ecologists. It implements a behavioral gambit by which researchers address hypothetical decision problems without reference to independently known cognitive processes. The second approach shares interest in the functional consequences of behavior, but shows greater subordination to empirical research on behavioral and cognitive mechanisms. Here natural selection is seen to act on processes that tune behavior to the environment across broad domains. Associative learning and Weber’s Law are two putative evolutionary responses to such challenges. In the second approach these independently known traits, rather than ad-hoc rules or heuristics, are considered as candidates for effecting decisions, and this can often lead to asking for the functional problem a posteriori, querying what selective pressures might have led to the presence of the trait. Based on examples from foraging research, it is argued that for a majority of decision problems investigated across vertebrates, the second approach is preferable. It is also recognized that dedicated rules are preferable when the relevant information acts across generations and involves little learning.Less
This chapter contrasts two approaches to the study of mechanism and function in decision making: rules of thumbs (or heuristics) and the contributions of experimental psychology and psychophysics. The first approach is most frequently used by behavioral ecologists. It implements a behavioral gambit by which researchers address hypothetical decision problems without reference to independently known cognitive processes. The second approach shares interest in the functional consequences of behavior, but shows greater subordination to empirical research on behavioral and cognitive mechanisms. Here natural selection is seen to act on processes that tune behavior to the environment across broad domains. Associative learning and Weber’s Law are two putative evolutionary responses to such challenges. In the second approach these independently known traits, rather than ad-hoc rules or heuristics, are considered as candidates for effecting decisions, and this can often lead to asking for the functional problem a posteriori, querying what selective pressures might have led to the presence of the trait. Based on examples from foraging research, it is argued that for a majority of decision problems investigated across vertebrates, the second approach is preferable. It is also recognized that dedicated rules are preferable when the relevant information acts across generations and involves little learning.
Peter Dayan
- Published in print:
- 2008
- Published Online:
- May 2016
- ISBN:
- 9780262195805
- eISBN:
- 9780262272353
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262195805.003.0003
- Subject:
- Psychology, Social Psychology
Values, rewards, and costs play a central role in economic, statistical, and psychological notions of decision making. They also have surprisingly direct neural realizations. This chapter discusses ...
More
Values, rewards, and costs play a central role in economic, statistical, and psychological notions of decision making. They also have surprisingly direct neural realizations. This chapter discusses the ways in which different value systems interact with different decision-making systems to fashion and shape affectively appropriate behavior in complex environments. Forms of deliberative and automatic decision making are interpreted as sharing a common purpose rather than serving different or non-normative goals.Less
Values, rewards, and costs play a central role in economic, statistical, and psychological notions of decision making. They also have surprisingly direct neural realizations. This chapter discusses the ways in which different value systems interact with different decision-making systems to fashion and shape affectively appropriate behavior in complex environments. Forms of deliberative and automatic decision making are interpreted as sharing a common purpose rather than serving different or non-normative goals.
Daeyeol Lee and Hyojung Seo
- Published in print:
- 2011
- Published Online:
- September 2011
- ISBN:
- 9780195393798
- eISBN:
- 9780199897049
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780195393798.003.0012
- Subject:
- Neuroscience, Behavioral Neuroscience, Development
Behavioral performance of humans and other animals and the neural activity related to various behaviors commonly display substantial variability, but the relationship between these two different ...
More
Behavioral performance of humans and other animals and the neural activity related to various behaviors commonly display substantial variability, but the relationship between these two different types of variability remains poorly understood. The dynamics of choice behaviors during a computer-simulated competitive game were studied and the chapter characterizes the variability in the activity of individual neurons recorded in the parietal, cingulate, and prefrontal areas of the primate cortex. It was found that the monkeys approached the optimal strategy during the competitive game using a reinforcement learning algorithm. In addition, the activity in multiple cortical areas commonly modulated their activity according to the variables necessary to update the animal’s decision-making strategies using the reinforcement learning algorithm. In particular, neurons in the dorsolateral prefrontal cortex and lateral intraparietal area encoded signals related to the value functions of the two alternative targets in addition to the animal’s choice and outcome history. Nevertheless, the residual variability in the activity of neurons in the cortical areas tested in this study that could not be accounted for by task-related variables was still substantial and higher than expected from additive Poisson noise. Moreover, the amount of this residual variability was particularly high for the anterior cingulate cortex. The functional significance of this regional difference in the variability of neural activity remains largely unknown.Less
Behavioral performance of humans and other animals and the neural activity related to various behaviors commonly display substantial variability, but the relationship between these two different types of variability remains poorly understood. The dynamics of choice behaviors during a computer-simulated competitive game were studied and the chapter characterizes the variability in the activity of individual neurons recorded in the parietal, cingulate, and prefrontal areas of the primate cortex. It was found that the monkeys approached the optimal strategy during the competitive game using a reinforcement learning algorithm. In addition, the activity in multiple cortical areas commonly modulated their activity according to the variables necessary to update the animal’s decision-making strategies using the reinforcement learning algorithm. In particular, neurons in the dorsolateral prefrontal cortex and lateral intraparietal area encoded signals related to the value functions of the two alternative targets in addition to the animal’s choice and outcome history. Nevertheless, the residual variability in the activity of neurons in the cortical areas tested in this study that could not be accounted for by task-related variables was still substantial and higher than expected from additive Poisson noise. Moreover, the amount of this residual variability was particularly high for the anterior cingulate cortex. The functional significance of this regional difference in the variability of neural activity remains largely unknown.
C. Eliasmith, X. Choo, T. Bekolay, and Chris Eliasmith
- Published in print:
- 2013
- Published Online:
- January 2014
- ISBN:
- 9780199794546
- eISBN:
- 9780199345236
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199794546.003.0006
- Subject:
- Neuroscience, Behavioral Neuroscience, Techniques
This chapter is devoted to consideration of adaptive processes in neural processing. This adaptation includes both weight changes (what is usually meant by ‘learning’), and dynamic stability of ...
More
This chapter is devoted to consideration of adaptive processes in neural processing. This adaptation includes both weight changes (what is usually meant by ‘learning’), and dynamic stability of network activity states (what is usually taken to be a model of ‘memory’). A serial working memory model is described in detail, that exploits the syntactic representations of the SPA. In addition, a biologically detailed spike-based learning rule is presented and applied to learning arbitrary vector functions, reinforcement learning, and explaining STDP effects. Finally, this rule is applied to a network that explains the results of the Wason Card Selection task. Tutorial: Learning in NengoLess
This chapter is devoted to consideration of adaptive processes in neural processing. This adaptation includes both weight changes (what is usually meant by ‘learning’), and dynamic stability of network activity states (what is usually taken to be a model of ‘memory’). A serial working memory model is described in detail, that exploits the syntactic representations of the SPA. In addition, a biologically detailed spike-based learning rule is presented and applied to learning arbitrary vector functions, reinforcement learning, and explaining STDP effects. Finally, this rule is applied to a network that explains the results of the Wason Card Selection task. Tutorial: Learning in Nengo
Jeffrey R. Stevens
- Published in print:
- 2008
- Published Online:
- May 2016
- ISBN:
- 9780262195805
- eISBN:
- 9780262272353
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262195805.003.0013
- Subject:
- Psychology, Social Psychology
Evolutionary and psychological approaches to decision making remain largely separate endeavors. Each offers necessary techniques and perspectives which, when integrated, will aid the study of ...
More
Evolutionary and psychological approaches to decision making remain largely separate endeavors. Each offers necessary techniques and perspectives which, when integrated, will aid the study of decision making in both humans and nonhuman animals. The evolutionary focus on selection pressures highlights the goals of decisions and the conditions under which different selection processes likely influence decision making. An evolutionary view also suggests that fully rational decision processes do not likely exist in nature. The psychological view proposes that cognition is hierarchically built on lower-level processes. Evolutionary approaches to decision making have not considered the cognitive building blocks necessary to implement decision strategies, thereby making most evolutionary models of behavior psychologically implausible. The synthesis of evolutionary and psychological constraints will generate more plausible models of decision making.Less
Evolutionary and psychological approaches to decision making remain largely separate endeavors. Each offers necessary techniques and perspectives which, when integrated, will aid the study of decision making in both humans and nonhuman animals. The evolutionary focus on selection pressures highlights the goals of decisions and the conditions under which different selection processes likely influence decision making. An evolutionary view also suggests that fully rational decision processes do not likely exist in nature. The psychological view proposes that cognition is hierarchically built on lower-level processes. Evolutionary approaches to decision making have not considered the cognitive building blocks necessary to implement decision strategies, thereby making most evolutionary models of behavior psychologically implausible. The synthesis of evolutionary and psychological constraints will generate more plausible models of decision making.
Jeffrey Cockburn and Michael Frank
- Published in print:
- 2011
- Published Online:
- August 2013
- ISBN:
- 9780262016438
- eISBN:
- 9780262298490
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262016438.003.0017
- Subject:
- Neuroscience, Behavioral Neuroscience
This chapter proposes a model in which the activity of the anterior cingulate cortex (ACC) is modulated in part by reinforcement learning processes in the basal ganglia. It describes how this ...
More
This chapter proposes a model in which the activity of the anterior cingulate cortex (ACC) is modulated in part by reinforcement learning processes in the basal ganglia. It describes how this relationship may lead to a better understanding of the error-related negativity (ERN). The conflict monitoring theory of the ERN is also discussed. Finally, the chapter describes the core basal ganglia model and its role in reinforcement learning and action selection.Less
This chapter proposes a model in which the activity of the anterior cingulate cortex (ACC) is modulated in part by reinforcement learning processes in the basal ganglia. It describes how this relationship may lead to a better understanding of the error-related negativity (ERN). The conflict monitoring theory of the ERN is also discussed. Finally, the chapter describes the core basal ganglia model and its role in reinforcement learning and action selection.
José J. F. Ribas-Fernandes, Yael Niv, and Matthew M. Botvinick
- Published in print:
- 2011
- Published Online:
- August 2013
- ISBN:
- 9780262016438
- eISBN:
- 9780262298490
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262016438.003.0016
- Subject:
- Neuroscience, Behavioral Neuroscience
This chapter discusses the relevance of reinforcement learning (RL) to its hierarchical structure. It first reviews the fundamentals of RL, with a focus on temporal-difference learning in ...
More
This chapter discusses the relevance of reinforcement learning (RL) to its hierarchical structure. It first reviews the fundamentals of RL, with a focus on temporal-difference learning in actor-critic models. Next, it discusses the scaling problem and the computational issues that stimulated the development of hierarchical reinforcement learning (HRL). The potential neuroscientific correlates of HRL are also described. The chapter also presents the results of some initial empirical tests and ends with directions for further research.Less
This chapter discusses the relevance of reinforcement learning (RL) to its hierarchical structure. It first reviews the fundamentals of RL, with a focus on temporal-difference learning in actor-critic models. Next, it discusses the scaling problem and the computational issues that stimulated the development of hierarchical reinforcement learning (HRL). The potential neuroscientific correlates of HRL are also described. The chapter also presents the results of some initial empirical tests and ends with directions for further research.
Dana H. Ballard
- Published in print:
- 2015
- Published Online:
- September 2015
- ISBN:
- 9780262028615
- eISBN:
- 9780262323819
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262028615.003.0005
- Subject:
- Neuroscience, Research and Theory
The learning of cognitive programs faces many technical difficulties, but the most important is the valuation of programs that have delayed rewards. The algorithms of Reinforcement Learning that the ...
More
The learning of cognitive programs faces many technical difficulties, but the most important is the valuation of programs that have delayed rewards. The algorithms of Reinforcement Learning that the brain uses tackle this head-on are logically situated in the Basal Ganglia, which represent the abstract sequential components of motor and cognitive plans. Such sequences are evaluated in terms of their expected reward and risk, which in turn are coded, by using the neurotransmitters dopamine and serotonin respectively, which serves as a common evaluative currency. Reinforcement learning algorithms learn by adjusting deviations in expected reward, a signal, which can also be used to program the Cortex’s memory representations. Tesauro’s use of reinforcement to learn the game of Backgammon provides a superb example of the putative integration of the process between the two forebrain subsystems.Less
The learning of cognitive programs faces many technical difficulties, but the most important is the valuation of programs that have delayed rewards. The algorithms of Reinforcement Learning that the brain uses tackle this head-on are logically situated in the Basal Ganglia, which represent the abstract sequential components of motor and cognitive plans. Such sequences are evaluated in terms of their expected reward and risk, which in turn are coded, by using the neurotransmitters dopamine and serotonin respectively, which serves as a common evaluative currency. Reinforcement learning algorithms learn by adjusting deviations in expected reward, a signal, which can also be used to program the Cortex’s memory representations. Tesauro’s use of reinforcement to learn the game of Backgammon provides a superb example of the putative integration of the process between the two forebrain subsystems.
Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch
- Published in print:
- 2020
- Published Online:
- October 2020
- ISBN:
- 9780190905033
- eISBN:
- 9780190905071
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/oso/9780190905033.003.0013
- Subject:
- Philosophy, Moral Philosophy
This chapter surveys eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior ...
More
This chapter surveys eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? The chapter focuses on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers. The questions surveyed include the following: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incentivize a system to “not have an overly large impact” or “not have many side effects”? The chapter discusses these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.Less
This chapter surveys eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? The chapter focuses on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers. The questions surveyed include the following: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incentivize a system to “not have an overly large impact” or “not have many side effects”? The chapter discusses these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.
Elisabeth A. Murray, Steven P. Wise, and Kim S. Graham
- Published in print:
- 2016
- Published Online:
- January 2017
- ISBN:
- 9780199686438
- eISBN:
- 9780191766312
- Item type:
- chapter
- Publisher:
- Oxford University Press
- DOI:
- 10.1093/acprof:oso/9780199686438.003.0003
- Subject:
- Neuroscience, Behavioral Neuroscience, Development
The reinforcement memory systems evolved early in the history of animals. Through these ancient mechanisms, animals can remember which actions in which circumstances produced benefits or avoided ...
More
The reinforcement memory systems evolved early in the history of animals. Through these ancient mechanisms, animals can remember which actions in which circumstances produced benefits or avoided costs (instrumental memories), as well as which contexts, objects, and places were associated with costs or benefits (Pavlovian memories). Surface similarities support a common set of terms and formalisms to describe reinforcement learning of many kinds, but the fact that they depend on a variety of unrelated brain structures and can be established by brainless animals, such as sea anemones, shows that they do not compose a single system or mechanism. As new memory systems emerged during vertebrate evolution, reinforcement learning persisted, but it cannot account for derived aspects of human cognition such as analogical, metaphorical, or relational reasoning, abstract problem-solving strategies, mental time travel, scenario construction, mental trial and error behavior, autobiographical narratives, language, a theory of mind, or explicit (declarative) memory.Less
The reinforcement memory systems evolved early in the history of animals. Through these ancient mechanisms, animals can remember which actions in which circumstances produced benefits or avoided costs (instrumental memories), as well as which contexts, objects, and places were associated with costs or benefits (Pavlovian memories). Surface similarities support a common set of terms and formalisms to describe reinforcement learning of many kinds, but the fact that they depend on a variety of unrelated brain structures and can be established by brainless animals, such as sea anemones, shows that they do not compose a single system or mechanism. As new memory systems emerged during vertebrate evolution, reinforcement learning persisted, but it cannot account for derived aspects of human cognition such as analogical, metaphorical, or relational reasoning, abstract problem-solving strategies, mental time travel, scenario construction, mental trial and error behavior, autobiographical narratives, language, a theory of mind, or explicit (declarative) memory.
Clay B. Holroyd and Nick Yeung
- Published in print:
- 2011
- Published Online:
- August 2013
- ISBN:
- 9780262016438
- eISBN:
- 9780262298490
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262016438.003.0018
- Subject:
- Neuroscience, Behavioral Neuroscience
This chapter discusses a new integrative theory of anterior cingulate cortex (ACC) function that proposes that the dorsal ACC supports the selection and execution of coherent behaviors over extended ...
More
This chapter discusses a new integrative theory of anterior cingulate cortex (ACC) function that proposes that the dorsal ACC supports the selection and execution of coherent behaviors over extended periods. It first presents the current theories of ACC function and its role in four key aspects of behavior: performance monitoring, action, reinforcement learning, and motivation. The chapter then outlines a new theory that proposes that ACC contributes to hierarchical reinforcement learning or high-level, temporally extended behaviors.Less
This chapter discusses a new integrative theory of anterior cingulate cortex (ACC) function that proposes that the dorsal ACC supports the selection and execution of coherent behaviors over extended periods. It first presents the current theories of ACC function and its role in four key aspects of behavior: performance monitoring, action, reinforcement learning, and motivation. The chapter then outlines a new theory that proposes that ACC contributes to hierarchical reinforcement learning or high-level, temporally extended behaviors.