Skip to main content Accessibility help
×
Hostname: page-component-7479d7b7d-wxhwt Total loading time: 0 Render date: 2024-07-12T18:13:02.148Z Has data issue: false hasContentIssue false

6 - Evolution of rewards and learning mechanisms in Cyber Rodents

from Part III - Brain-based robots: architectures and approaches

Published online by Cambridge University Press:  05 February 2012

Jeffrey L. Krichmar
Affiliation:
University of California, Irvine
Hiroaki Wagatsuma
Affiliation:
Kyushu Institute of Technology (KYUTECH), Japan
Get access

Summary

Finding the design principle of reward functions is a big challenge in both artificial intelligence and neuroscience. Successful acquisition of a task usually requires rewards to be given not only for goals but also for intermediate states to promote effective exploration. We propose a method to design “intrinsic” rewards for autonomous robots by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use the Cyber Rodent robots, in which collision avoidance, recharging from battery pack, and “mating” by software reproduction are three major “extrinsic” rewards. We show in hardware experiments that the robots can find appropriate intrinsic rewards for the visual properties of battery packs and potential mating partners to promote approach behaviors.

Introduction

In application of reinforcement learning (Sutton and Barto, 1998) to real-world problems, the design of the reward function is critical for successful achievement of the task. Designing appropriate reward functions is a nontrivial, time-consuming process in practical applications. Although it appears straightforward to assign positive rewards to desired goal states and negative rewards to states to be avoided, finding a good balance between multiple rewards often needs careful tuning. Furthermore, if rewards are given only at isolated goal states, blind exploration of the state space takes a long time. Rewards at intermediate subgoals, or even along the trajectories leading to the goal, promote focused exploration, but appropriate design of such additional rewards usually requires prior knowledge of the task or trial and error by the experimenter.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Barto, A. G.Singh, S.Chentanez, N. 2004 112
Baxter, J.Bartlett, P. L. 2001 Infinite-horizon gradient-based policy searchJournal of Artificial Intelligence Research 15 319Google Scholar
Doya, K.Uchibe, E. 2005 The Cyber Rodent project: exploration of adaptive mechanisms for self-preservation and self-reproductionAdaptive Behavior 13 149CrossRefGoogle Scholar
Elfwing, S. 2007 Embodied evolution of learning abilityKTH School of Computer Science and CommunicationStockholm, SwedenGoogle Scholar
Elfwing, S.Uchibe, E.Doya, K.Christensen, H. I. 2008 Sendhoff, B.Krner, E.Sporns, O.Ritter, H.Doya, K.Creating Brain-Like IntelligenceBerlinSpringer278Google Scholar
Elfwing, S.Uchibe, E.Doya, K.Christensen, H. I. 2008 Co-evolution of shaping rewards and meta-parameters in reinforcement learningAdaptive Behavior 16 400CrossRefGoogle Scholar
Eshelman, L. J.Schaffer, J. D. 1993 Real-coded genetic algorithms and interval-schemataFoundations of Genetic AlgorithmsSan Francisco, CAMorgan Kaufmann187Google Scholar
Frmling, K. 2007 Guiding exploration by pre-existing knowledge without modifying rewardNeural Networks 20 736CrossRefGoogle Scholar
Konda, V. R.Tsitsiklis, J. N. 2003 Actor-critic algorithmsSIAM Journal on Control and Optimization 42 1143CrossRefGoogle Scholar
Meeden, L. A.Marshall, J. B.Blank, D. 2004 American Association for Artificial Intelligence
Ng, A. Y.Harada, D.Russel, S. 1999
Oudeyer, P.-Y.Kaplan, F. 2004 Proceedings of the 4th International Workshop on Epigenetic RoboticsLund, SwedenLund University127Google Scholar
Oudeyer, P.-Y.Kaplan, F. 2007 What is intrinsic motivation? A typology of computational approachesFrontiers in Neurorobotics 1CrossRefGoogle ScholarPubMed
Oudeyer, P.-Y.Kaplan, F.Hafner, V. 2007 Intrinsic motivation systems for autonomous mental developmentIEEE Transactions on Evolutionary Computation 11 265CrossRefGoogle Scholar
Sato, T.Uchibe, E.Doya, K. 2008 Learning how, what, and whether to communicate: emergence of protocommunication in reinforcement learning agentsJournal of Artificial Life and Robotics 12 70CrossRefGoogle Scholar
Singh, S.Barto, A. G.Chentanez, N. 2005 Saul, L. K.Weiss, Y.Bottou, L.Advances in Neural Information Processing SystemsCambridge, MAMIT Press1281Google Scholar
Singh, S.Lewis, R.Barto, A. G. 2009 2601
Stout, A.Konidaris, G. D.Barto, A. G. 2005
Sutton, R. S.Barto, A. G 1998 Reinforcement LearningCambridge, MAMIT Press/Bradford BooksGoogle Scholar
Sutton, R. S.Precup, D.Singh, S. 1999 Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learningArtificial Intelligence 112 181CrossRefGoogle Scholar
Takeuchi, J.Shouno, O.Tsujino, H. 2006 54
Takeuchi, J.Shouno, O.Tsujino, H. 2007 1151
Uchibe, E.Doya, K. 2004 Schaal, S.Ijspeert, A.Billard, A.Proceedings of the Eighth International Conference on Simulation of Adaptive Behavior: From Animals to Animats 8Cambridge, MA:MIT Press287Google Scholar
Uchibe, E.Doya, K. 2007
Uchibe, E.Doya, K. 2008 Finding intrinsic rewards by embodied evolution and constrained reinforcement learningNeural Networks 21 1447CrossRefGoogle ScholarPubMed
Usui, Y.Arita, T. 2003 212
Watson, R. A.Ficici, S. G.Pollack, J. B. 2002 Embodied evolution: distributing an evolutionary algorithm in a population of robotsRobotics and Autonomous Systems 39 1CrossRefGoogle Scholar
Wiewiora, E. 2003 Potential-based shaping and Q-value initialization are equivalentJournal of Artificial Intelligence Research 19 205Google Scholar
Wischmann, S.Stamm, K.Wrgtter, F. 2007 Advances in Artificial Life: 9th European Conference on Artificial LifeBerlinSpringer284CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×