* Running (not working yet) version of recurrent PPO * Fixes for multi envs * Save WIP, rework the sampling * Add Box support * Fix sample order * Being cleanup, code is broken (again) * First working version (no shared lstm) * Start cleanup * Try rnn with value function * Re-enable batch size * Deactivate vf rnn * Allow any batch size * Add support for evaluation * Add CNN support * Fix start of sequence * Allow shared LSTM * Rename mask to episode_start * Fix type hint * Enable LSTM for critic * Clean code * Fix for CNN LSTM * Fix sampling with n_layers > 1 * Add std logger * Update wording * Rename and add dict obs support * Fixes for dict obs support * Do not run slow tests * Fix doc * Update recurrent PPO example * Update README * Use Pendulum-v1 for tests * Fix image env * Speedup LSTM forward pass (#63) * added more efficient lstm implementation * Rename and add comment Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Fixes * Remove OpenAI sampling and improve coverage * Sync with SB3 PPO * Pass state shape and allow lstm kwargs * Update tests * Add masking for padded sequences * Update default in perf test * Remove TODO, mask is now working * Add helper to remove duplicated code, remove hack for padding * Enable LSTM critic and raise threshold for cartpole with no vel * Fix tests * Update doc and tests * Doc fix * Fix for new Sphinx version * Fix doc note * Switch to batch first, no more additional swap * Add comments and mask entropy loss Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com> |
||
|---|---|---|
| .github | ||
| docs | ||
| sb3_contrib | ||
| scripts | ||
| tests | ||
| .coveragerc | ||
| .gitignore | ||
| .readthedocs.yml | ||
| CITATION.bib | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| pyproject.toml | ||
| setup.cfg | ||
| setup.py | ||
README.md
Stable-Baselines3 - Contrib (SB3-Contrib)
Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code. "sb3-contrib" for short.
What is SB3-Contrib?
A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations.
Why create this repository?
Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms.
However sometimes these utilities were too niche to be considered for stable-baselines or proved to be too difficult to integrate well into the existing code without creating a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows us to provide reliable implementations following stable-baselines usual standards (consistent style, documentation, etc) beyond the relatively small scope of utilities in the main repository.
Features
See documentation for the full list of included features.
RL Algorithms:
- Augmented Random Search (ARS)
- Quantile Regression DQN (QR-DQN)
- PPO with invalid action masking (MaskablePPO)
- PPO with recurrent policy (RecurrentPPO aka PPO LSTM)
- Truncated Quantile Critics (TQC)
- Trust Region Policy Optimization (TRPO)
Gym Wrappers:
Documentation
Documentation is available online: https://sb3-contrib.readthedocs.io/
Installation
To install Stable Baselines3 contrib with pip, execute:
pip install sb3-contrib
We recommend to use the master version of Stable Baselines3.
To install Stable Baselines3 master version:
pip install git+https://github.com/DLR-RM/stable-baselines3
To install Stable Baselines3 contrib master version:
pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
How To Contribute
If you want to contribute, please read CONTRIBUTING.md guide first.
Citing the Project
To cite this repository in publications (please cite SB3 directly):
@article{stable-baselines3,
author = {Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann},
title = {Stable-Baselines3: Reliable Reinforcement Learning Implementations},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {268},
pages = {1-8},
url = {http://jmlr.org/papers/v22/20-1364.html}
}