Deterministic Policy Gradient Algorithms

Determinis)c+Policy+Gradient+
Algorithms
Silver,+D.,+Lever,+G.,+Heess,+N.,+Degris,+T.,+
Wierstra,+D.,+&+Riedmiller,+M.+(2014,+June).+In+
ICML.+
2015/8/20+
D1+
•  ICML+2014
• 
•  DeepMind
• 
Appendix
+
+
+
• 
– 
+
+
• 
+→+
– 
+
• 
– 
– 
– 
+
• 
– 
– 
– 
• 
+
+
+
+
(COPDAC)+
+
+
Octopus+Arm +
+
• 
– 
+
+
• 
+→+
– 
+
• 
– 
– 
– 
+
• 
– 
– 
– 
• 
+
+
+
+
(COPDAC)+
+
+
Octopus+Arm +
(ENATDAC,EQNAC)+
• 
– 
Natural+ActorUCri)c
+
RBM
– 
+
–  RBM
+
34
• 
RBM
14
+
• 
• 
• 
+
+
+
+
+
+
+
+
+
+
(
• 
)
+
–  →
+
+
• 
–  →
+
+
• 
– 
+
+
• 
+→+
– 
+
• 
– 
– 
– 
+
• 
– 
– 
– 
• 
+
+
+
+
(COPDAC)+
+
+
Octopus+Arm +
• 
2010]+
• 
• 
Determinis)c+Policy+Gradient+Theorem
[Peters,
+
• 
μ
• 
OffUpolicy+Determinis)c+Policy+Gradient+Theorem
+
+
COPDAC+(COPDACUQ)
• 
• 
+
+
+
+
+
• 
+
• 
+
v
+
• 
– 
+
+
• 
+→+
– 
+
• 
– 
– 
– 
+
• 
– 
– 
– 
• 
+
+
+
+
(COPDAC)+
+
+
Octopus+Arm +
+
– 
– 
– 
– 
– 
– 
– 
x(U1)
+
+10,25,50+
+
/
(SAC%B)
(COPDAC%B)
+
• 
– 
• 
+
+
– 
– 
– 
– 
– 
+
+
0.99(
5000
+
)0.999(
)+
φ
+
SAC(
(Degris2012a))
+
+
OffPAC
COPDACUQ
+
+
+
+
+
• 
–  30
– 
•  COPDACUQ
+
• 
–  6
–  50
–  20
– 
– 
–  300
+
(
/
/
)+
+50
•  COPDACUQ
–  8
– 
MLP
+
40
MLP
Octopus+Arm+Task
10
• 
4
+
+
+
• 
• 
• 
•  50
50
+
20