diff --git a/.github/ISSUE_TEMPLATE/issue-template.md b/.github/ISSUE_TEMPLATE/issue-template.md index b8189aa..4c3ddaa 100644 --- a/.github/ISSUE_TEMPLATE/issue-template.md +++ b/.github/ISSUE_TEMPLATE/issue-template.md @@ -44,6 +44,7 @@ Traceback (most recent call last): File ... **System Info** Describe the characteristic of your environment: * Describe how the library was installed (pip, docker, source, ...) + * Stable-Baselines3 and sb3-contrib versions * GPU models and configuration * Python version * PyTorch version diff --git a/CHANGELOG.md b/CHANGELOG.md index c14db75..40573a5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,4 @@ -## Release 0.9.0a2 (WIP) +## Release 0.10.0a0 (WIP) ### Breaking Changes diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index decf076..8ebb1ec 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -2,7 +2,7 @@ This contrib repository is designed for experimental implementations of various parts of reinforcement training so that others may make use of them. This includes full -training algorithms, different tools (e.g. new environment wrappers, +RL algorithms, different tools (e.g. new environment wrappers, callbacks) and extending algorithms implemented in stable-baselines3. **Before opening a pull request**, open an issue discussing the contribution. @@ -10,9 +10,9 @@ Once we agree that the plan looks good, go ahead and implement it. Contributions and review focuses on following three parts: 1) **Implementation quality** - - Performance of the training algorithms should match what proposed authors reported (if applicable). + - Performance of the RL algorithms should match the one reported by the original authors (if applicable). - This is ensured by including a code that replicates an experiment from the original - paper or from an established codebase (e.g. the code from authors), as well as + paper or from an established codebase (e.g. the code from authors), as well as a test to check that implementation works on program level (does not crash). 2) Documentation - Documentation quality should match that of stable-baselines3, with each feature covered @@ -20,7 +20,7 @@ Contributions and review focuses on following three parts: of logic and report of the expected results, where applicable. 3) Consistency with stable-baselines3 - To ease readability, all contributions need to follow the code style (see below) and - idioms used in stable-baselines3. + idioms used in stable-baselines3. The implementation quality is a strict requirements with little room for changes, because otherwise the implementation can do more harm than good (wrong results). Parts two and three @@ -33,7 +33,7 @@ for suggestions of the community for new possible features to include in contrib ## How to implement your suggestion Implement your feature/suggestion/algorithm in following ways, using the first one that applies: -1) Environment wrapper: This can be used with any algorithm and even outside stable-baselines3. +1) Environment wrapper: This can be used with any algorithm and even outside stable-baselines3. Place code for these under `sb3_contrib/common/wrappers` directory. 2) [Custom callback](https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html). Place code under `sb3_contrib/common/callbacks` directory. @@ -63,17 +63,17 @@ Along with the code, PR **must** include the following: this goes under respective pages in documentation. If full training algorithm, this goes under a new page with template below (`docs/modules/[algo_name]`). 2) If a training algorithm/improvement: results of a replicated experiment from the original paper in the documentation, - **which must match the results from authors** unless solid arguments can be provided why they did not match. + **which must match the results from authors** unless solid arguments can be provided why they did not match. 3) If above holds: The **exact** code to run the replicated experiment (i.e. it should produce the above results), and inside the code information about the environment used (Python version, library versions, OS, hardware information). If small enough, include this in the documentation. If applicable, use [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo) to - run the agent performance comparison experiments (fork repository, implement experiment in a new branch and share link to + run the agent performance comparison experiments (fork repository, implement experiment in a new branch and share link to that branch). If above do not apply, create new code to replicate the experiment and include link to it. 4) Updated tests in `tests/test_run.py` and `tests/test_save_load.py` to test that features run as expected and serialize correctly. This this is **not** for testing e.g. training performance of a learning algorithm, and should be relatively quick to run. -Below is a template for documentation for full training algorithms. +Below is a template for documentation for full RL algorithms. ```rst [Feature/Algorithm name] diff --git a/Makefile b/Makefile index d740b60..a8f34af 100644 --- a/Makefile +++ b/Makefile @@ -5,7 +5,7 @@ pytest: ./scripts/run_tests.sh type: - pytype + pytype -j auto lint: # stop the build if there are Python syntax errors or undefined names diff --git a/README.md b/README.md index bd26110..cdcc9fe 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,38 @@ + + [![CI](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/workflows/CI/badge.svg)](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/actions) [![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) -# Stable-Baselines3 - Contrib +# Stable-Baselines3 - Contrib (SB3-Contrib) -Contrib package for [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) - Experimental code. +Contrib package for [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) - Experimental reinforcement learning (RL) code. "sb3-contrib" for short. -A place for training algorithms and tools that are considered experimental, e.g. implementations of the latest -publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured -implementations. +### What is SB3-Contrib? -Why create this repository? Over the span of stable-baselines and stable-baselines3, the community has been eager -to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) -and learning algorithms. However sometimes these utilities were too niche to be considered for stable-baselines or -proved to be too difficult to integrate well into existing code without a mess. sb3-contrib aims to fix this by -not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything -remotely useful goes! We hope this allows to extend the known quality of stable-baselines style and documentation beyond -the relatively small scope of utilities of the main repository. +A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations. + +### Why create this repository? + +Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms. + +However sometimes these utilities were too niche to be considered for stable-baselines or +proved to be too difficult to integrate well into existing code without a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows to extend the known quality of stable-baselines style and documentation beyond the relatively small scope of utilities of the main repository. ## Features See documentation for the full list of included features. -**Training algorithms**: +**RL Algorithms**: - [Truncated Quantile Critics (TQC)](https://arxiv.org/abs/2005.04269) + + + + ## Installation **Note:** You need the `master` version of [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3/). @@ -40,6 +47,10 @@ Install Stable Baselines3 - Contrib using pip: pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib ``` +## How To Contribute + +If you want to contribute, please read [**CONTRIBUTING.md**](./CONTRIBUTING.md) guide first. + ## Citing the Project diff --git a/docs/index.rst b/docs/index.rst index 79c5f5d..d590f25 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -6,7 +6,7 @@ Welcome to Stable Baselines3 Contrib docs! ========================================== -Contrib package for `Stable Baselines3 `_ - Experimental code. +Contrib package for `Stable Baselines3 (SB3) `_ - Experimental code. Github repository: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib @@ -64,7 +64,7 @@ To cite this project in publications: Contributing ------------ -If you want to contribute, please read `CONTRIBUTING.md `_ first. +If you want to contribute, please read `CONTRIBUTING.md `_ first. Indices and tables -------------------