The 16th Annual Conference of Japanese Society for Artificial Intelligence, 2002 2A1-4 ÁèĝÀ,<´Ą ĝÀăėƐù¿-jTĄİ Reinforcement Learning and Computational Neuroscience - Possible Functions of Neuromodulators in Metalearning ŲŒÇô*1,*2 Kenji Doya *1 *2 ATR Ġ¤ĔƔÅ°ċ ATR Human Information Science Laboratories «Ĉě¶ï¸ŗ CREST CREST, Japan Science and Technology Corporation The framework of reinforcement learning captures an essential function of the nervous system: to realize behaviors for acquisition of reward. Thus the architectures and algorithms of reinforcement learning can provide important clues as to the organization and functions of the nervous system. Here I report three such examples: 1) a model of the basal ganglia as the circuit for reinforcement learning; 2) understanding of the specialization and collaboration of the cerebellum, the basal ganglia, and the cerebral cortex; 3) working hypotheses about the roles neuromodulators in regulating the metaparameters of reinforcement learning. The concept of reinforcement learning can provide a common ground for interdisciplinary studies. sensory input EM+*-Ɣą?ų<"6-ÛŰ?ŕæŤ,Ą <)´Ą-ƟŏĮš.ŰƐ8Ġ¤-ÛŰĄã7¦ƙŤ+łƝ?Ń'< 7 7´< . “reinforcement” )Ìƫ.ĚƯ8ĪƐ-ĥ :ý;'ƭ"7-(<´Ą-ÖŤ+Å°-ś (ſ="ApLoQi8AGXHVl.Ġ¤8Ű Ɛ-ÛŰĄ-ŻŵjF]Qi?Ư<đ(Ƥƪ+ÿ ;?Ƨ'=< 1. Cerebral cortex state st Striatum striosome state value V(s) TD error δt SNc dopamine neurons ŎŻƂù)Ż-¤,xŚ<ŎŻ¦Ţ. -ƈƒ ,9;ª<bGsSsƈ8`sVsZsƍŮƈ+*ďĖ:ŰĦÏ,:-¾><)ř:=' " -ĩēò-¨ź.ŞĝÀ-Ŷ($" -Ɯ-ìÔ?Ƨ'="-śŻ:ŎŻ¦Ţ 8ĵůƫ,ij{?Ę/[bhs)Ɛù?ƕć <]mrs-Ű©ƺ(<[1]LED Ŧū": qa?)ÛŰ?Mp,Ą"ò -Ċ§ ,.[bhs]mrs.Ɣą(<OmP,Ŋ' Ŭ<ÛŰĄƲ"Î,.[bhs ]mrs. LED -Ŧū,Ŋ'Ŭ<9,+; OmP,Ŋ'.Ŭ++<=.´ĄƯƻ(Ɣą-Ʀń:-ŀÊ?ƇTD Ñá δt = rt + V(st) - γ V(st-1) -ě<ƍ)Ƅē,ƴð'<TD Ñá.ƔąƦń) ÛŰĄ-Ƴü-ĄęÝ(;-ſÆ?½¨,Ŏ Ż¦Ţ)[bhs]mrs-¨ź?´Ą-Ƽ Ĺ5(İƜ<kYpţv='<ġ [1,2][ bhs¿-ƌƣƐyŇ+*ÛŰ-´,%+<) ïú.wĵ:ř:='"ŎŻ¦Ţ-N\d P-ķħ[bhs,9$'ĦÏ=<)úÈŤ ƸƮıŲŒÇô, ATR Ġ¤ĔƔÅ°ċ, 619-0288 ± ŨƊĽ»īŜÒō 2-2-2, Fax: 0774-95-1259, E-mail: [email protected] http://www.atr.co.jp/his/~doya motor output reward rt matrix action value Q(s,a) SNr, GP action at Thalamus ġ ŎŻ¦Ţ-´ĄkYpŎŻƂù-ĖŋƇ Ë st ?7),ijĕʼn(striatum)(ĖŋŘ¥ģ V(s))ÛŰ Ř¥ģ Q(s,a)Áè=ßùƞƩƎ(SNr))Ŗľ¯ (GP)?À'íĎ(Thalamus)8Ż-ŰśĤ1-ƹ( ÛŰ at ĴŐ=<ßùƽƛƎ(SNc). TD Ñáδt ?ij ĕʼn,cB[aWH[bhsyŇħ-N\dP ķħ,9;Ř¥ģĄ=< ,76:=Ɣą-Ʀń) =,¦&ÛŰĴŐ ŎŻ¦Ţ-¦ƙŤ+¨ź)'Ư=<9,+$" 2. !# PET 8 fMRI +*,9<Ġ¤-ŻŰÁń,9;wĵ .þ,Ű,¥><)Ú:='ƭ"čŻ8ŎŻ¦Ţ ÌÐĉƯ8uèCjOļå8ƇĔŸř+*Űw -¨ź,7¥><)ƜŽ,+$"[3]4"ĝÀƸƮ ?ŝ2<«Ĉ-ğƓ,9;čŻ7ŎŻ¦Ţ7ÜóŸ ř¨ź-â(<)=<ŎŻƂù-ĵůĵơ,ćƵ?Ŀ <)Ɯ:,=" (ŰĦÏ#(.+ čŻ)ŎŻ¦Ţ-Ƣ,¥'Ĝ"+ëÚ-ƼĹ5 ƆƬ)='<ŎŻ¦Ţ?´Ą?úË<"6 -1- The 16th Annual Conference of Japanese Society for Artificial Intelligence, 2002 -ƹ<kYp.:,ŻĶʼn-¨źƑ-ÉƯ,¥ '7ĆƬ+õà?Ƨ'=< =4(-Å°(čŻ-Ą.ÑáęÝ?7)," µê;Ą-ƼĹ5(İƜ='<4"ŎŻƂ ù-]mrs-ŬŴħ.ŷƵ-ŭÁŤħù?7), "µê+Ą-ƼĹ5(İƜ='<=: )Þ><)čŻŎŻ¦ŢŎŻƂù. =!= µê;Ą´Ąµê+Ą)z+<Ā ƶ-Ą-ApLoQi,'IJƠ"ĝÀƹ( <)ƼĹ5Ƌ0đ<ġ[4] Unsupervised learning ġŌƇŤ+ĝÀăėƐù¿ßùƽƛƎ(SNc))Ə łƃơ(VTA)-[bhs¿żłtij(DR)-R rZ]s¿Ĭƀ(LC)-_pA[q\os¿ś (S))gC^pZ(M)-ARVpKos¿ output input Cerebral Cortex Reinforcement learning reward Basal Thalamus Ganglia substantia nigra inferior olive input Cerebellum output target Supervised learning + error input output ġčŻ(cerebellum)ŎŻ¦Ţ(basal ganglia)ŎŻ Ƃù(cerebral cortex). =!=µê;Ą(supervised learnning)´Ą(reinforcement learning)µê+ Ą(unsupervised learning),Ŵ"ƹÙŁ)N\dP ķħ-jF]Qi?ñ% Ʒ/ƎƑ¢ń-´Ą(.Ř¥ģ,9<¦ƙŤ +´Ą-Ƭĸ,=Ėŋ-ĢšŘ(belief state)? ØĜ<Ɩ÷ƪ:=<-9+ĉƯ.čŻ,µ ê;Ą(ų="¡³-ƦńkYp)ŎŻ¦Ţ ,´Ą(ų="Ř¥ģ?ŎŻƂù,µê+ Ą(ų="=ĖŋƇË?'%+Þ>< )(úËź+.(<-9+Ż-Ŏ¹Ť+¨ź Ƒ )¨źŭ Þ,¥ <İ .fMRI + *(ų: =< Ż-*Ű'<)YT? (* ĉƯÛ>='<)Ư,%+<đ(Ƥƪ +ÿ;)+;ų< 3. " ´Ą?rfWZĦÏ+*,ťƪ'5'[5,6]ş <).rfWZĄ<wđ,<"6,úÈ üöĞĄ-Ɩ?Ą:='<))( <´ĄApLoQi(.Čƭ-ƔąƦń-~ Ʊ(γ)ŕæ-nsUi?Â6<¬ ũ(β)Ą-Ņũ ¼ģ(α)+*-jTbnjT-ťĭ+Įš.Ą ŏ8¡³ĕÃ,yŇ<"6ň-ĒÞúÈü-îÛ çÑ,9<Vm]sI?ƆƬ)<=Å°øq ep(.ĨÓ?Ă6'<ĄrfWZő++ĥ -ś,ć'Û+Ŏ+Ưƥ(< }Ɩ>=>=-Ż. -Ą-jTbnjT?œ, Vm]sI'7:>+'7ƚř-¡³-7)( Ʃ+ÛŰ?öưŤ,Ą<)ź(<%4; Ż,. -Ą-jTbnjT?öÍŝį<jT Ą-¨Ùƅ>$'<)Ú:=< Ż,<jTĄ-Ŕÿ)'Ż:ŎŻƂù 8¦ŢčŻ,×ŪûñņŤ+åƪ?3ĝ ÀăėƐù¿Ú:=<ġºŹƑéĪƐ Ť«Ĉ-ſő,9;Ʃ+ĝÀăėƐù) -āƨʼnŻŵ(-ƑƉ8äƗqep(-åƪ4" -ĺƣ8| ŧé_WHADZ,9<ÛŰ1-·,¥<YT.Ƙ Ŏ,ų:='< =:-úÈŤřÆ)ƯƻkYp?e P, 1) [bhs.ƔąƦń:-ŀÊTD Ñáδ 2) RrZ]s.ƔąƦń-ò¤PJp~Ʊγ 3) _pA[q\os.ÛŰ-nsUi¬ ũβ 4) ARVpKos.©-ØĜ-ĄŅũα ? =!=ƇËĦÏ'<)İÚ:=<[7] -İ-ÄĐ,Õ'CRESTŻ?Ļ<-Å° ŏ)'ĄƯƻÅ°nWZ8Mp(-ĪƯúÈĠ ¤-ŻŰÁńrfWZúÈ?Ĺ5Þ>"²űÅ° ğ@(< 4. ´Ą-ƟŏĮš.9;ƴħź?®6<Ö,7 4"ĪƐ-ť8Ġ¤-ÛŰ-ÉƯ,ž<Å°,7²ŠƼĹ5(;z+<ƟƑơ?£Ĝ"+Å°-¦Ɓ? Ƨ<7-)+;ų< $ [1] Schultz, W., Dayan, P., and Montague, P.R.: A neural substrate of prediction and reward. Science, 275, 1593-1599 (1997). [2] Houk, J.C., Adams, J.L., and Barto, A.G.: A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J.C. Houk, et al. Eds: Models of Information Processing in the Basal Ganglia, pp. 249-270. MIT Press (1995). [3] Doya, K.: Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology, 10, 732-739 (2000). [4] Doya, K.: What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex. Neural Networks, 12, 961-974 (1999). [5] Doya, K.: Reinforcement learning in continuous time and space. Neural Computation, 12, 219-245 (2000). [6] Doya, K., Kimura, H., and Kawato, M.: Neural mechanisms of learning and control. IEEE Control Systems Magazine, 21(4), 42-54 (2001). [7] Doya, K.: Metalearning and neuromodulation: Neural Networks, 15(4), (2002) -2-
© Copyright 2025 ExpyDoc