Rl algorithms
Below are the implemented algorithms and their brief descriptions.
- [x] Deep Q-Learning (DQN)
- dqn.py
- For discrete action space.
- dqn_atari.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- dqn.py
- [x] Categorical DQN (C51)
- c51.py
- For discrete action space.
- c51_atari.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- c51_atari_visual.py
- Adds return and q-values visulization for
dqn_atari.py
.
- Adds return and q-values visulization for
- c51.py
- [x] Proximal Policy Gradient (PPO)
- All of the PPO implementations below are augmented with some code-level optimizations. See https://costa.sh/blog-the-32-implementation-details-of-ppo.html for more details
- ppo.py
- For discrete action space.
- ppo_continuous_action.py
- For continuous action space. Also implemented Mujoco-specific code-level optimizations
- ppo_atari.py
- For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- [x] Soft Actor Critic (SAC)
- sac_continuous_action.py
- For continuous action space.
- sac_continuous_action.py
- [x] Deep Deterministic Policy Gradient (DDPG)
- ddpg_continuous_action.py
- For continuous action space.
- ddpg_continuous_action.py
- [x] Twin Delayed Deep Deterministic Policy Gradient (TD3)
- td3_continuous_action.py
- For continuous action space.
- td3_continuous_action.py