* Feat: adding TRPO algorithm (WIP) WIP - Trust Region Policy Algorithm Currently the Hessian vector product is not working (see inline comments for more detail) * Feat: adding TRPO algorithm (WIP) Adding no_grad block for the line search Additional assert in the conjugate solver to help debugging * Feat: adding TRPO algorithm (WIP) - Adding ActorCriticPolicy.get_distribution - Using the Distribution object to compute the KL divergence - Checking for objective improvement in the line search - Moving magic numbers to instance variables * Feat: adding TRPO algorithm (WIP) Improving numerical stability of the conjugate gradient algorithm Critic updates * Feat: adding TRPO algorithm (WIP) Changes around the alpha of the line search Adding TRPO to __init__ files * feat: TRPO - addressing PR comments - renaming cg_solver to conjugate_gradient_solver and renaming parameter Avp_fun to matrix_vector_dot_func + docstring - extra comments + better variable names in trpo.py - defining a method for the hessian vector product instead of an inline function - fix registering correct policies for TRPO and using correct policy base in constructor * refactor: TRPO - policier - refactoring sb3_contrib.common.policies to reuse as much code as possible from sb3 * feat: using updated ActorCriticPolicy from SB3 - get_distribution will be added directly to the SB3 version of ActorCriticPolicy, this commit reflects this * Bump version for `get_distribution` support * Add basic test * Reformat * [ci skip] Fix changelog * fix: setting train mode for trpo * fix: batch_size type hint in trpo.py * style: renaming variables + docstring in trpo.py * Rename + cleanup * Move grad computation to separate method * Remove grad norm clipping * Remove n epochs and add sub-sampling * Update defaults * Add Doc * Add more test and fixes for CNN * Update doc + add benchmark * Add tests + update doc * Fix doc * Improve names for conjugate gradient * Update comments * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> |
||
|---|---|---|
| .github | ||
| docs | ||
| sb3_contrib | ||
| scripts | ||
| tests | ||
| .coveragerc | ||
| .gitignore | ||
| .readthedocs.yml | ||
| CITATION.bib | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| pyproject.toml | ||
| setup.cfg | ||
| setup.py | ||
README.md
Stable-Baselines3 - Contrib (SB3-Contrib)
Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code. "sb3-contrib" for short.
What is SB3-Contrib?
A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations.
Why create this repository?
Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms.
However sometimes these utilities were too niche to be considered for stable-baselines or proved to be too difficult to integrate well into the existing code without creating a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows us to provide reliable implementations following stable-baselines usual standards (consistent style, documentation, etc) beyond the relatively small scope of utilities in the main repository.
Features
See documentation for the full list of included features.
RL Algorithms:
- Truncated Quantile Critics (TQC)
- Quantile Regression DQN (QR-DQN)
- PPO with invalid action masking (MaskablePPO)
- Trust Region Policy Optimization (TRPO)
Gym Wrappers:
Documentation
Documentation is available online: https://sb3-contrib.readthedocs.io/
Installation
To install Stable Baselines3 contrib with pip, execute:
pip install sb3-contrib
We recommend to use the master version of Stable Baselines3.
To install Stable Baselines3 master version:
pip install git+https://github.com/DLR-RM/stable-baselines3
To install Stable Baselines3 contrib master version:
pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
How To Contribute
If you want to contribute, please read CONTRIBUTING.md guide first.
Citing the Project
To cite this repository in publications (please cite SB3 directly):
@article{stable-baselines3,
author = {Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann},
title = {Stable-Baselines3: Reliable Reinforcement Learning Implementations},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {268},
pages = {1-8},
url = {http://jmlr.org/papers/v22/20-1364.html}
}