为了让大家对 RL 的 SOTA 算法有一个直观的概念,我重新整理了一下 SOTA 算法目录,有些我已经在self-implement,有些写了相关的paper reading.
Model-free
Value-based
Sarsa, Sarsa( )
[ Paper | Code | Blog | 1994 ]Double Dueling Deep Q Network (D3QN)
[ No Paper | Code | Blog | 2015 ]Hindsight Experience Replay(HER) (也可用于DDPG)
[ Paper | Code | Blog | 2017 ]
Policy-based
Vanilla Policy Gradient / REINFORCE
[ Paper | Code | Blog | 2000 ]Trust Region Policy Optimization (TRPO)
[ Paper | Code | Blog | 2015 ]Proximal Policy Optimization (PPO)
[ Paper | Code | Blog | 2017 ]
Actor-Critic
Actor-Critic
[ Paper | Code pytorch | Blog | 2000 ]Deep Deterministic Policy Gradient (DDPG)
[ Paper | Code1 OpenAI | Code2 | Blog | 2015 ]
Model-based
Value Prediction Network (VPN)
[ Paper | Code | Blog | 2018 ]Model-Based Value Expansion (MVE)
[ Paper | Code | Blog | 2018 ]Stochastic Ensemble Value Expansion (STEVE)
[ Paper | Code | Blog | 2018 ]Model-Based Policy Optimization (MBPO)
[ Paper | Code | Blog | 2019 ]
Hierarchical RL
Hierarchical DQN (h-DQN)
[ Paper | Code Keras | Code pytorch | Blog | 2016 ]Hierarchical-Actor-Critic (HAC)
[ Paper | Code pytorch | Code TF | Blog_CN | Blog_EG | 2019 ]
Distributed Architecture
Asynchronous Advantage Actor-Critic (A3C)
[ Paper | Code pytorch | Blog | 2016 ]Distributed PPO (DPPO)
[ Paper | Code pytorch | Blog | 2017 ]IMPALA
[ Paper | Code | Blog | 2018 ]Divergence-augmented Policy Optimization (DAPO)
[ Paper | Code | Blog | 2019 ]
Multi-Agent
Value-Decomposition Networks (VDN)
[ Paper | Code | Blog | 2017 ]MADDPG
[ Paper | Code OpenAI | Blog | 2017 ]QMIX
[ Paper | Code | Blog | 2018 ]Actor-Attention-Critic for Multi-Agent (MAAC)
[ Paper | Code | Blog | 2018 ]
链接有误,烦请告知,不胜感激
更多算法实现见本专栏关联Github
欢迎 Watch & Star !!!!!
本文转载自:https://zhuanlan.zhihu.com/p/137208923