* first pass at ars, replicates initial results, still needs more testing, cleanup * add a few docs and tests, bugfixes for ARS * debug and comment * break out dump logs * rollback so there are now predict workers, some refactoring * remove callback from self, remove torch multiprocessing * add module docs * run formatter * fix load and rerun formatter * rename to less mathy variable names, rename _validate_hypers * refactor to use evaluatate_policy, linear policy no longer uses bias or squashing * move everything to torch, add support for discrete action spaces, bugfix for alive reward offset * added tests, passing all of them, add support for discrete action spaces * update documentation * allow for reward offset when there are multiple envs * update results again * Reformat * Ignore unused imports * Renaming + Cleanup * Experimental multiprocessing * Cleaner multiprocessing * Reformat * Fixes for callback * Fix combining stats * 2nd way * Make the implementation cpu only * Fixes + POC with mp module * POC Processes * Cleaner aync implementation * Remove unused arg * Add typing * Revert vec normalize offset hack * Add `squash_output` parameter * Add more tests * Add comments * Update doc * Add comments * Add more logging * Fix TRPO issue on GPU * Tmp fix for ARS tests on GPU * Additional tmp fixes for ARS * update docstrings + formatting, fix bad exceptioe string in ARSPolicy * Add comments and docstrings * Fix missing import * Fix type check * Add dosctrings * GPU support, first attempt * Fix test * Add missing docstring * Typos * Update defaults hyperparameters Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> |
||
|---|---|---|
| .github | ||
| docs | ||
| sb3_contrib | ||
| scripts | ||
| tests | ||
| .coveragerc | ||
| .gitignore | ||
| .readthedocs.yml | ||
| CITATION.bib | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| pyproject.toml | ||
| setup.cfg | ||
| setup.py | ||
README.md
Stable-Baselines3 - Contrib (SB3-Contrib)
Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code. "sb3-contrib" for short.
What is SB3-Contrib?
A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations.
Why create this repository?
Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms.
However sometimes these utilities were too niche to be considered for stable-baselines or proved to be too difficult to integrate well into the existing code without creating a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows us to provide reliable implementations following stable-baselines usual standards (consistent style, documentation, etc) beyond the relatively small scope of utilities in the main repository.
Features
See documentation for the full list of included features.
RL Algorithms:
- Truncated Quantile Critics (TQC)
- Quantile Regression DQN (QR-DQN)
- PPO with invalid action masking (MaskablePPO)
- Trust Region Policy Optimization (TRPO)
- Augmented Random Search (ARS)
Gym Wrappers:
Documentation
Documentation is available online: https://sb3-contrib.readthedocs.io/
Installation
To install Stable Baselines3 contrib with pip, execute:
pip install sb3-contrib
We recommend to use the master version of Stable Baselines3.
To install Stable Baselines3 master version:
pip install git+https://github.com/DLR-RM/stable-baselines3
To install Stable Baselines3 contrib master version:
pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
How To Contribute
If you want to contribute, please read CONTRIBUTING.md guide first.
Citing the Project
To cite this repository in publications (please cite SB3 directly):
@article{stable-baselines3,
author = {Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann},
title = {Stable-Baselines3: Reliable Reinforcement Learning Implementations},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {268},
pages = {1-8},
url = {http://jmlr.org/papers/v22/20-1364.html}
}