Evolution of rewards and learning mechanisms in Cyber Rodents

Eiji Uchibe; Kenji Doya

doi:10.1017/CBO9780511994838.007

6 - Evolution of rewards and learning mechanisms in Cyber Rodents

from Part III - Brain-based robots: architectures and approaches

Published online by Cambridge University Press: 05 February 2012

Eiji Uchibe and

Kenji Doya

Edited by

Jeffrey L. Krichmar and

Hiroaki Wagatsuma

Show author details

Jeffrey L. Krichmar: Affiliation:
University of California, Irvine
Hiroaki Wagatsuma: Affiliation:
Kyushu Institute of Technology (KYUTECH), Japan

Book contents

Get access

Summary

Finding the design principle of reward functions is a big challenge in both artificial intelligence and neuroscience. Successful acquisition of a task usually requires rewards to be given not only for goals but also for intermediate states to promote effective exploration. We propose a method to design “intrinsic” rewards for autonomous robots by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use the Cyber Rodent robots, in which collision avoidance, recharging from battery pack, and “mating” by software reproduction are three major “extrinsic” rewards. We show in hardware experiments that the robots can find appropriate intrinsic rewards for the visual properties of battery packs and potential mating partners to promote approach behaviors.

Introduction

In application of reinforcement learning (Sutton and Barto, 1998) to real-world problems, the design of the reward function is critical for successful achievement of the task. Designing appropriate reward functions is a nontrivial, time-consuming process in practical applications. Although it appears straightforward to assign positive rewards to desired goal states and negative rewards to states to be avoided, finding a good balance between multiple rewards often needs careful tuning. Furthermore, if rewards are given only at isolated goal states, blind exploration of the state space takes a long time. Rewards at intermediate subgoals, or even along the trajectories leading to the goal, promote focused exploration, but appropriate design of such additional rewards usually requires prior knowledge of the task or trial and error by the experimenter.

Type: Chapter
Information: Neuromorphic and Brain-Based Robots , pp. 109 - 128

DOI: https://doi.org/10.1017/CBO9780511994838.007 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Barto, A. G.Singh, S.Chentanez, N. 2004 112

Baxter, J.Bartlett, P. L. 2001 Infinite-horizon gradient-based policy searchJournal of Artificial Intelligence Research 15 319Google Scholar

Doya, K.Uchibe, E. 2005 The Cyber Rodent project: exploration of adaptive mechanisms for self-preservation and self-reproductionAdaptive Behavior 13 149CrossRef Google Scholar

Elfwing, S. 2007 Embodied evolution of learning abilityKTH School of Computer Science and CommunicationStockholm, SwedenGoogle Scholar

Elfwing, S.Uchibe, E.Doya, K.Christensen, H. I. 2008 Sendhoff, B.Krner, E.Sporns, O.Ritter, H.Doya, K.Creating Brain-Like IntelligenceBerlinSpringer278Google Scholar

Elfwing, S.Uchibe, E.Doya, K.Christensen, H. I. 2008 Co-evolution of shaping rewards and meta-parameters in reinforcement learningAdaptive Behavior 16 400CrossRef Google Scholar

Eshelman, L. J.Schaffer, J. D. 1993 Real-coded genetic algorithms and interval-schemataFoundations of Genetic AlgorithmsSan Francisco, CAMorgan Kaufmann187Google Scholar

Frmling, K. 2007 Guiding exploration by pre-existing knowledge without modifying rewardNeural Networks 20 736CrossRef Google Scholar

Konda, V. R.Tsitsiklis, J. N. 2003 Actor-critic algorithmsSIAM Journal on Control and Optimization 42 1143CrossRef Google Scholar

Meeden, L. A.Marshall, J. B.Blank, D. 2004 American Association for Artificial Intelligence

Ng, A. Y.Harada, D.Russel, S. 1999

Oudeyer, P.-Y.Kaplan, F. 2004 Proceedings of the 4th International Workshop on Epigenetic RoboticsLund, SwedenLund University127Google Scholar

Oudeyer, P.-Y.Kaplan, F. 2007 What is intrinsic motivation? A typology of computational approachesFrontiers in Neurorobotics 1CrossRef Google Scholar PubMed

Oudeyer, P.-Y.Kaplan, F.Hafner, V. 2007 Intrinsic motivation systems for autonomous mental developmentIEEE Transactions on Evolutionary Computation 11 265CrossRef Google Scholar

Sato, T.Uchibe, E.Doya, K. 2008 Learning how, what, and whether to communicate: emergence of protocommunication in reinforcement learning agentsJournal of Artificial Life and Robotics 12 70CrossRef Google Scholar

Singh, S.Barto, A. G.Chentanez, N. 2005 Saul, L. K.Weiss, Y.Bottou, L.Advances in Neural Information Processing SystemsCambridge, MAMIT Press1281Google Scholar

Singh, S.Lewis, R.Barto, A. G. 2009 2601

Stout, A.Konidaris, G. D.Barto, A. G. 2005

Sutton, R. S.Barto, A. G 1998 Reinforcement LearningCambridge, MAMIT Press/Bradford BooksGoogle Scholar

Sutton, R. S.Precup, D.Singh, S. 1999 Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learningArtificial Intelligence 112 181CrossRef Google Scholar

Takeuchi, J.Shouno, O.Tsujino, H. 2006 54

Takeuchi, J.Shouno, O.Tsujino, H. 2007 1151

Uchibe, E.Doya, K. 2004 Schaal, S.Ijspeert, A.Billard, A.Proceedings of the Eighth International Conference on Simulation of Adaptive Behavior: From Animals to Animats 8Cambridge, MA:MIT Press287Google Scholar

Uchibe, E.Doya, K. 2007

Uchibe, E.Doya, K. 2008 Finding intrinsic rewards by embodied evolution and constrained reinforcement learningNeural Networks 21 1447CrossRef Google Scholar PubMed

Usui, Y.Arita, T. 2003 212

Watson, R. A.Ficici, S. G.Pollack, J. B. 2002 Embodied evolution: distributing an evolutionary algorithm in a population of robotsRobotics and Autonomous Systems 39 1CrossRef Google Scholar

Wiewiora, E. 2003 Potential-based shaping and Q-value initialization are equivalentJournal of Artificial Intelligence Research 19 205Google Scholar

Wischmann, S.Stamm, K.Wrgtter, F. 2007 Advances in Artificial Life: 9th European Conference on Artificial LifeBerlinSpringer284CrossRef Google Scholar

Book contents

6 - Evolution of rewards and learning mechanisms in Cyber Rodents

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive