强化学习基础 Ⅳ: State-of-the-art 强化学习经典算法汇总 (知乎转载)

R

reinfo pytorch强化学习openaigithubppo

发布时间:2021-05-04阅读(2276)


为了让大家对 RL 的 SOTA 算法有一个直观的概念,我重新整理了一下 SOTA 算法目录,有些我已经在self-implement,有些写了相关的paper reading.

Model-free

Value-based

  1. Q-learning
    Paper | Code | Blog | 1992 ]

  2. Sarsa, Sarsa( [公式] )
    Paper | Code | Blog | 1994 ]

  3. Deep Q Network (DQN)
    Paper | Code | Blog | 2015 ]

  4. Double Deep Q Network
    Paper | Code | Blog | 2015 ]

  5. Dueling Deep Q Network
    Paper | Code | Blog | 2015 ]

  6. Double Dueling Deep Q Network (D3QN)
    [ No Paper | Code | Blog | 2015 ]

  7. Rainbow
    Paper | Code | Blog | 2017 ]

  8. Hindsight Experience Replay(HER) (也可用于DDPG)
    Paper | Code | Blog | 2017 ]

Policy-based

  1. Vanilla Policy Gradient / REINFORCE
    Paper | Code | Blog | 2000 ]

  2. Trust Region Policy Optimization (TRPO)
    Paper | Code | Blog | 2015 ]

  3. Proximal Policy Optimization (PPO)
    Paper | Code | Blog | 2017 ]

Actor-Critic

  1. Actor-Critic
    Paper | Code pytorch | Blog | 2000 ]

  2. Advantage Actor-Critic (A2C)
    [ No Paper | Code | Blog | 未知 ]

  3. Deep Deterministic Policy Gradient (DDPG)
    Paper | Code1 OpenAI | Code2 | Blog | 2015 ]

  4. Twin Delayed DDPG (TD3)
    Paper | Code | Blog | 2018 ]

  5. Soft Actor-Critic (SAC)
    Paper | Code tf | Blog | 2018 ]

Model-based

  1. Dyna
    Paper | Code | Blog | 1991 ]

  2. PILCO
    Paper | Code | Blog | 2011 ]

  3. Value Prediction Network (VPN)
    Paper | Code | Blog | 2018 ]

  4. Guided Policy Search (GPS)
    Paper | Code | Blog | 2017 ]

  5. Model-Based Value Expansion (MVE)
    Paper | Code | Blog | 2018 ]

  6. Stochastic Ensemble Value Expansion (STEVE)
    Paper | Code | Blog | 2018 ]

  7. Model-Based Policy Optimization (MBPO)
    Paper | Code | Blog | 2019 ]


Hierarchical RL

  1. Hierarchical DQN (h-DQN)
    Paper | Code Keras | Code pytorch | Blog | 2016 ]

  2. Hierarchical DDPG (h-DDPG)
    Paper | Code | Blog | 2017 ]

  3. Hierarchical-Actor-Critic (HAC)
    Paper | Code pytorch | Code TF | Blog_CN | Blog_EG | 2019 ]


Distributed Architecture

  1. Asynchronous Advantage Actor-Critic (A3C)
    Paper | Code pytorch | Blog | 2016 ]

  2. Distributed PPO (DPPO)
    Paper | Code pytorch | Blog | 2017 ]

  3. IMPALA
    Paper | Code | Blog | 2018 ]

  4. APE-X
    Paper | Code | Blog | 2018 ]

  5. Divergence-augmented Policy Optimization (DAPO)
    Paper | Code | Blog | 2019 ]

Multi-Agent

  1. Value-Decomposition Networks (VDN)
    Paper | Code | Blog | 2017 ]

  2. MADDPG
    Paper | Code OpenAI | Blog | 2017 ]

  3. Mean Field Multi-Agent RL
    Paper | Code | Blog | 2018 ]

  4. QMIX
    Paper | Code | Blog | 2018 ]

  5. Actor-Attention-Critic for Multi-Agent (MAAC)
    Paper | Code | Blog | 2018 ]

链接有误,烦请告知,不胜感激

更多算法实现见本专栏关联Github

Machine-Learning-is-ALL-You-Needgithub.com

欢迎 Watch & Star !!!!!

本文转载自:https://zhuanlan.zhihu.com/p/137208923


喜欢(0) 收藏 举报 分享来自:Windows客户端
发表评论
0条评论
首页 语料库 发布 文章 我的