Update wording and links
This commit is contained in:
parent
00f9d26d55
commit
926e488196
|
|
@ -44,6 +44,7 @@ Traceback (most recent call last): File ...
|
||||||
**System Info**
|
**System Info**
|
||||||
Describe the characteristic of your environment:
|
Describe the characteristic of your environment:
|
||||||
* Describe how the library was installed (pip, docker, source, ...)
|
* Describe how the library was installed (pip, docker, source, ...)
|
||||||
|
* Stable-Baselines3 and sb3-contrib versions
|
||||||
* GPU models and configuration
|
* GPU models and configuration
|
||||||
* Python version
|
* Python version
|
||||||
* PyTorch version
|
* PyTorch version
|
||||||
|
|
|
||||||
|
|
@ -1,4 +1,4 @@
|
||||||
## Release 0.9.0a2 (WIP)
|
## Release 0.10.0a0 (WIP)
|
||||||
|
|
||||||
### Breaking Changes
|
### Breaking Changes
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
This contrib repository is designed for experimental implementations of various
|
This contrib repository is designed for experimental implementations of various
|
||||||
parts of reinforcement training so that others may make use of them. This includes full
|
parts of reinforcement training so that others may make use of them. This includes full
|
||||||
training algorithms, different tools (e.g. new environment wrappers,
|
RL algorithms, different tools (e.g. new environment wrappers,
|
||||||
callbacks) and extending algorithms implemented in stable-baselines3.
|
callbacks) and extending algorithms implemented in stable-baselines3.
|
||||||
|
|
||||||
**Before opening a pull request**, open an issue discussing the contribution.
|
**Before opening a pull request**, open an issue discussing the contribution.
|
||||||
|
|
@ -10,9 +10,9 @@ Once we agree that the plan looks good, go ahead and implement it.
|
||||||
|
|
||||||
Contributions and review focuses on following three parts:
|
Contributions and review focuses on following three parts:
|
||||||
1) **Implementation quality**
|
1) **Implementation quality**
|
||||||
- Performance of the training algorithms should match what proposed authors reported (if applicable).
|
- Performance of the RL algorithms should match the one reported by the original authors (if applicable).
|
||||||
- This is ensured by including a code that replicates an experiment from the original
|
- This is ensured by including a code that replicates an experiment from the original
|
||||||
paper or from an established codebase (e.g. the code from authors), as well as
|
paper or from an established codebase (e.g. the code from authors), as well as
|
||||||
a test to check that implementation works on program level (does not crash).
|
a test to check that implementation works on program level (does not crash).
|
||||||
2) Documentation
|
2) Documentation
|
||||||
- Documentation quality should match that of stable-baselines3, with each feature covered
|
- Documentation quality should match that of stable-baselines3, with each feature covered
|
||||||
|
|
@ -20,7 +20,7 @@ Contributions and review focuses on following three parts:
|
||||||
of logic and report of the expected results, where applicable.
|
of logic and report of the expected results, where applicable.
|
||||||
3) Consistency with stable-baselines3
|
3) Consistency with stable-baselines3
|
||||||
- To ease readability, all contributions need to follow the code style (see below) and
|
- To ease readability, all contributions need to follow the code style (see below) and
|
||||||
idioms used in stable-baselines3.
|
idioms used in stable-baselines3.
|
||||||
|
|
||||||
The implementation quality is a strict requirements with little room for changes, because
|
The implementation quality is a strict requirements with little room for changes, because
|
||||||
otherwise the implementation can do more harm than good (wrong results). Parts two and three
|
otherwise the implementation can do more harm than good (wrong results). Parts two and three
|
||||||
|
|
@ -33,7 +33,7 @@ for suggestions of the community for new possible features to include in contrib
|
||||||
## How to implement your suggestion
|
## How to implement your suggestion
|
||||||
|
|
||||||
Implement your feature/suggestion/algorithm in following ways, using the first one that applies:
|
Implement your feature/suggestion/algorithm in following ways, using the first one that applies:
|
||||||
1) Environment wrapper: This can be used with any algorithm and even outside stable-baselines3.
|
1) Environment wrapper: This can be used with any algorithm and even outside stable-baselines3.
|
||||||
Place code for these under `sb3_contrib/common/wrappers` directory.
|
Place code for these under `sb3_contrib/common/wrappers` directory.
|
||||||
2) [Custom callback](https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html).
|
2) [Custom callback](https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html).
|
||||||
Place code under `sb3_contrib/common/callbacks` directory.
|
Place code under `sb3_contrib/common/callbacks` directory.
|
||||||
|
|
@ -63,17 +63,17 @@ Along with the code, PR **must** include the following:
|
||||||
this goes under respective pages in documentation. If full training algorithm, this goes under a new page with template below
|
this goes under respective pages in documentation. If full training algorithm, this goes under a new page with template below
|
||||||
(`docs/modules/[algo_name]`).
|
(`docs/modules/[algo_name]`).
|
||||||
2) If a training algorithm/improvement: results of a replicated experiment from the original paper in the documentation,
|
2) If a training algorithm/improvement: results of a replicated experiment from the original paper in the documentation,
|
||||||
**which must match the results from authors** unless solid arguments can be provided why they did not match.
|
**which must match the results from authors** unless solid arguments can be provided why they did not match.
|
||||||
3) If above holds: The **exact** code to run the replicated experiment (i.e. it should produce the above results), and inside the
|
3) If above holds: The **exact** code to run the replicated experiment (i.e. it should produce the above results), and inside the
|
||||||
code information about the environment used (Python version, library versions, OS, hardware information). If small enough,
|
code information about the environment used (Python version, library versions, OS, hardware information). If small enough,
|
||||||
include this in the documentation. If applicable, use [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo) to
|
include this in the documentation. If applicable, use [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo) to
|
||||||
run the agent performance comparison experiments (fork repository, implement experiment in a new branch and share link to
|
run the agent performance comparison experiments (fork repository, implement experiment in a new branch and share link to
|
||||||
that branch). If above do not apply, create new code to replicate the experiment and include link to it.
|
that branch). If above do not apply, create new code to replicate the experiment and include link to it.
|
||||||
4) Updated tests in `tests/test_run.py` and `tests/test_save_load.py` to test that features run as expected and serialize
|
4) Updated tests in `tests/test_run.py` and `tests/test_save_load.py` to test that features run as expected and serialize
|
||||||
correctly. This this is **not** for testing e.g. training performance of a learning algorithm, and
|
correctly. This this is **not** for testing e.g. training performance of a learning algorithm, and
|
||||||
should be relatively quick to run.
|
should be relatively quick to run.
|
||||||
|
|
||||||
Below is a template for documentation for full training algorithms.
|
Below is a template for documentation for full RL algorithms.
|
||||||
|
|
||||||
```rst
|
```rst
|
||||||
[Feature/Algorithm name]
|
[Feature/Algorithm name]
|
||||||
|
|
|
||||||
2
Makefile
2
Makefile
|
|
@ -5,7 +5,7 @@ pytest:
|
||||||
./scripts/run_tests.sh
|
./scripts/run_tests.sh
|
||||||
|
|
||||||
type:
|
type:
|
||||||
pytype
|
pytype -j auto
|
||||||
|
|
||||||
lint:
|
lint:
|
||||||
# stop the build if there are Python syntax errors or undefined names
|
# stop the build if there are Python syntax errors or undefined names
|
||||||
|
|
|
||||||
37
README.md
37
README.md
|
|
@ -1,31 +1,38 @@
|
||||||
|
<img src="docs/\_static/img/logo.png" align="right" width="40%"/>
|
||||||
|
|
||||||
[](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/actions) [](https://github.com/psf/black)
|
[](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/actions) [](https://github.com/psf/black)
|
||||||
|
|
||||||
# Stable-Baselines3 - Contrib
|
# Stable-Baselines3 - Contrib (SB3-Contrib)
|
||||||
|
|
||||||
Contrib package for [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) - Experimental code.
|
Contrib package for [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) - Experimental reinforcement learning (RL) code.
|
||||||
"sb3-contrib" for short.
|
"sb3-contrib" for short.
|
||||||
|
|
||||||
A place for training algorithms and tools that are considered experimental, e.g. implementations of the latest
|
### What is SB3-Contrib?
|
||||||
publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured
|
|
||||||
implementations.
|
|
||||||
|
|
||||||
Why create this repository? Over the span of stable-baselines and stable-baselines3, the community has been eager
|
A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations.
|
||||||
to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces)
|
|
||||||
and learning algorithms. However sometimes these utilities were too niche to be considered for stable-baselines or
|
### Why create this repository?
|
||||||
proved to be too difficult to integrate well into existing code without a mess. sb3-contrib aims to fix this by
|
|
||||||
not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything
|
Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms.
|
||||||
remotely useful goes! We hope this allows to extend the known quality of stable-baselines style and documentation beyond
|
|
||||||
the relatively small scope of utilities of the main repository.
|
However sometimes these utilities were too niche to be considered for stable-baselines or
|
||||||
|
proved to be too difficult to integrate well into existing code without a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows to extend the known quality of stable-baselines style and documentation beyond the relatively small scope of utilities of the main repository.
|
||||||
|
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
See documentation for the full list of included features.
|
See documentation for the full list of included features.
|
||||||
|
|
||||||
**Training algorithms**:
|
**RL Algorithms**:
|
||||||
- [Truncated Quantile Critics (TQC)](https://arxiv.org/abs/2005.04269)
|
- [Truncated Quantile Critics (TQC)](https://arxiv.org/abs/2005.04269)
|
||||||
|
|
||||||
|
|
||||||
|
<!-- TODO: uncomment when the repo is public -->
|
||||||
|
<!-- ## Documentation
|
||||||
|
|
||||||
|
Documentation is available online: [https://sb3-contrib.readthedocs.io/](https://sb3-contrib.readthedocs.io/) -->
|
||||||
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
**Note:** You need the `master` version of [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3/).
|
**Note:** You need the `master` version of [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3/).
|
||||||
|
|
@ -40,6 +47,10 @@ Install Stable Baselines3 - Contrib using pip:
|
||||||
pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## How To Contribute
|
||||||
|
|
||||||
|
If you want to contribute, please read [**CONTRIBUTING.md**](./CONTRIBUTING.md) guide first.
|
||||||
|
|
||||||
|
|
||||||
## Citing the Project
|
## Citing the Project
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -6,7 +6,7 @@
|
||||||
Welcome to Stable Baselines3 Contrib docs!
|
Welcome to Stable Baselines3 Contrib docs!
|
||||||
==========================================
|
==========================================
|
||||||
|
|
||||||
Contrib package for `Stable Baselines3 <https://github.com/DLR-RM/stable-baselines3>`_ - Experimental code.
|
Contrib package for `Stable Baselines3 (SB3) <https://github.com/DLR-RM/stable-baselines3>`_ - Experimental code.
|
||||||
|
|
||||||
|
|
||||||
Github repository: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
Github repository: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
||||||
|
|
@ -64,7 +64,7 @@ To cite this project in publications:
|
||||||
Contributing
|
Contributing
|
||||||
------------
|
------------
|
||||||
|
|
||||||
If you want to contribute, please read `CONTRIBUTING.md <https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md>`_ first.
|
If you want to contribute, please read `CONTRIBUTING.md <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/CONTRIBUTING.md>`_ first.
|
||||||
|
|
||||||
Indices and tables
|
Indices and tables
|
||||||
-------------------
|
-------------------
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue