diff --git a/.github/ISSUE_TEMPLATE/issue-template.md b/.github/ISSUE_TEMPLATE/issue-template.md index 1268278..4c3ddaa 100644 --- a/.github/ISSUE_TEMPLATE/issue-template.md +++ b/.github/ISSUE_TEMPLATE/issue-template.md @@ -32,6 +32,7 @@ for both code and stack traces. ```python from stable_baselines3 import ... +from sb3_contrib import ... ``` @@ -43,6 +44,7 @@ Traceback (most recent call last): File ... **System Info** Describe the characteristic of your environment: * Describe how the library was installed (pip, docker, source, ...) + * Stable-Baselines3 and sb3-contrib versions * GPU models and configuration * Python version * PyTorch version diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 2b35d61..197aabf 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,13 +1,11 @@ - + ## Description - + -## Motivation and Context - - - -- [ ] I have raised an issue to propose this change ([required](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) for new features and bug fixes) +## Context + +- [ ] I have raised an issue to propose this change ([required](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/CONTRIBUTING.md)) ## Types of changes @@ -19,11 +17,13 @@ ## Checklist: -- [ ] I've read the [CONTRIBUTION](https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md) guide (**required**) -- [ ] I have updated the changelog accordingly (**required**). -- [ ] My change requires a change to the documentation. +- [ ] I've read the [CONTRIBUTION](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/CONTRIBUTING.md) guide (**required**) +- [ ] The functionality/performance matches that of the source (**required** for new training algorithms or training-related features). - [ ] I have updated the tests accordingly (*required for a bug fix or a new feature*). +- [ ] I have included an example of using the feature (*required for new features*). +- [ ] I have included baseline results (**required** for new training algorithms or training-related features). - [ ] I have updated the documentation accordingly. +- [ ] I have updated the changelog accordingly (**required**). - [ ] I have reformatted the code using `make format` (**required**) - [ ] I have checked the codestyle using `make check-codestyle` and `make lint` (**required**) - [ ] I have ensured `make pytest` and `make type` both pass. (**required**) diff --git a/CHANGELOG.md b/CHANGELOG.md index c14db75..40573a5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,4 @@ -## Release 0.9.0a2 (WIP) +## Release 0.10.0a0 (WIP) ### Breaking Changes diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..8ebb1ec --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,177 @@ +## Contributing to Stable-Baselines3 - Contrib + +This contrib repository is designed for experimental implementations of various +parts of reinforcement training so that others may make use of them. This includes full +RL algorithms, different tools (e.g. new environment wrappers, +callbacks) and extending algorithms implemented in stable-baselines3. + +**Before opening a pull request**, open an issue discussing the contribution. +Once we agree that the plan looks good, go ahead and implement it. + +Contributions and review focuses on following three parts: +1) **Implementation quality** + - Performance of the RL algorithms should match the one reported by the original authors (if applicable). + - This is ensured by including a code that replicates an experiment from the original + paper or from an established codebase (e.g. the code from authors), as well as + a test to check that implementation works on program level (does not crash). +2) Documentation + - Documentation quality should match that of stable-baselines3, with each feature covered + in the documentation, in-code documentation to clarify the flow + of logic and report of the expected results, where applicable. +3) Consistency with stable-baselines3 + - To ease readability, all contributions need to follow the code style (see below) and + idioms used in stable-baselines3. + +The implementation quality is a strict requirements with little room for changes, because +otherwise the implementation can do more harm than good (wrong results). Parts two and three +are taken into account during review but being a repository for more experimental code, these +are not very strict. + +See [issues with "experimental" tag](https://github.com/DLR-RM/stable-baselines3/issues?q=is%3Aissue+is%3Aopen+label%3Aexperimental) +for suggestions of the community for new possible features to include in contrib. + +## How to implement your suggestion + +Implement your feature/suggestion/algorithm in following ways, using the first one that applies: +1) Environment wrapper: This can be used with any algorithm and even outside stable-baselines3. + Place code for these under `sb3_contrib/common/wrappers` directory. +2) [Custom callback](https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html). + Place code under `sb3_contrib/common/callbacks` directory. +3) Following the style/naming of `common` files in the stable-baseline3. If your suggestion is a specific network architecture + for feature extraction from images, place this in `sb3_contrib/common/torch_layers.py`, for example. +4) A new learning algorithm. This is the last resort but most applicable solution. + Even if your suggestion is a (trivial) modification to an existing algorithm, create a new algorithm for it + unless otherwise discussed (which inherits the base algorithm). The algorithm should use same API as + stable-baselines3 algorithms (e.g. `learn`, `load`), and the code should be placed under + `sb3_contrib/[algorithm_name]` directory. + +Look over stable-baselines3 code for the general naming of variables and try to keep this style. + +If algorithm you are implementing involves more complex/uncommon equations, comment each part of these +calculations with references to the parts in paper. + +## Pull Request (PR) and review + +Before proposing a PR, please open an issue, where the feature will be discussed. +This prevent from duplicated PR to be proposed and also ease the code review process. + +Each PR need to be reviewed and accepted by at least one of the maintainers. +A PR must pass the Continuous Integration tests to be merged with the master branch. + +Along with the code, PR **must** include the following: +1) Update to documentation to include a description of the feature. If feature is a simple tool (e.g. wrapper, callback), + this goes under respective pages in documentation. If full training algorithm, this goes under a new page with template below + (`docs/modules/[algo_name]`). +2) If a training algorithm/improvement: results of a replicated experiment from the original paper in the documentation, + **which must match the results from authors** unless solid arguments can be provided why they did not match. +3) If above holds: The **exact** code to run the replicated experiment (i.e. it should produce the above results), and inside the + code information about the environment used (Python version, library versions, OS, hardware information). If small enough, + include this in the documentation. If applicable, use [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo) to + run the agent performance comparison experiments (fork repository, implement experiment in a new branch and share link to + that branch). If above do not apply, create new code to replicate the experiment and include link to it. +4) Updated tests in `tests/test_run.py` and `tests/test_save_load.py` to test that features run as expected and serialize + correctly. This this is **not** for testing e.g. training performance of a learning algorithm, and + should be relatively quick to run. + +Below is a template for documentation for full RL algorithms. + +```rst +[Feature/Algorithm name] +======================== + +- Non-abbreviated name and/or one-sentence description of the method. +- Link and reference to the original publications the present the feature, or other established source(s). +- Links to any codebases that were used for reference (e.g. authors' implementations) + +Example +------- + +A minimal example on how to use the feature (full, runnable code). + +Results +------- + +A description and comparison of results (e.g. how the change improved results over the non-changed algorithm), if +applicable. + +Include the expected results from the work that originally proposed the method (e.g. original paper). + +Include the code to replicate these results or a link to repository/branch where the code can be found. +Use `rl-baselines3-zoo `_ if possible, fork it, create a new branch +and share the code to replicate results there. + +Comments +-------- + +Comments regarding the implementation, e.g. missing parts, uncertain parts, differences +to the original implementation. +```` + +If you are not familiar with creating a Pull Request, here are some guides: +- http://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request +- https://help.github.com/articles/creating-a-pull-request/ + + +## Codestyle + +We are using [black codestyle](https://github.com/psf/black) (max line length of 127 characters) together with [isort](https://github.com/timothycrosley/isort) to sort the imports. + +**Please run `make format`** to reformat your code. You can check the codestyle using `make check-codestyle` and `make lint`. + +Please document each function/method and [type](https://google.github.io/pytype/user_guide.html) them using the following template: + +```python + +def my_function(arg1: type1, arg2: type2) -> returntype: + """ + Short description of the function. + + :param arg1: describe what is arg1 + :param arg2: describe what is arg2 + :return: describe what is returned + """ + ... + return my_variable +``` + +## Tests + +All new features and algorithms must add tests in the `tests/` folder ensuring that everything works fine (on program level). +We use [pytest](https://pytest.org/). +Also, when a bug fix is proposed, tests should be added to avoid regression. + +To run tests with `pytest`: + +``` +make pytest +``` + +Type checking with `pytype`: + +``` +make type +``` + +Codestyle check with `black`, `isort` and `flake8`: + +``` +make check-codestyle +make lint +``` + +To run `pytype`, `format` and `lint` in one command: +``` +make commit-checks +``` + +Build the documentation: + +``` +make doc +``` + +## Changelog and Documentation + +Please do not forget to update the changelog (`CHANGELOG.rst`). + +Credits: this contributing guide is based on the [PyTorch](https://github.com/pytorch/pytorch/) one. diff --git a/Makefile b/Makefile index d740b60..a8f34af 100644 --- a/Makefile +++ b/Makefile @@ -5,7 +5,7 @@ pytest: ./scripts/run_tests.sh type: - pytype + pytype -j auto lint: # stop the build if there are Python syntax errors or undefined names diff --git a/README.md b/README.md index 7280244..cdcc9fe 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,56 @@ + + [![CI](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/workflows/CI/badge.svg)](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/actions) [![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) -# Stable-Baselines3 - Contrib +# Stable-Baselines3 - Contrib (SB3-Contrib) -Contrib package for [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) - Experimental code +Contrib package for [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) - Experimental reinforcement learning (RL) code. +"sb3-contrib" for short. -**You need the master version of Stable Baselines3**: -``` -pip install git+https://github.com/DLR-RM/stable-baselines3 -``` +### What is SB3-Contrib? -Implemented: +A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations. + +### Why create this repository? + +Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms. + +However sometimes these utilities were too niche to be considered for stable-baselines or +proved to be too difficult to integrate well into existing code without a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows to extend the known quality of stable-baselines style and documentation beyond the relatively small scope of utilities of the main repository. + + +## Features + +See documentation for the full list of included features. + +**RL Algorithms**: - [Truncated Quantile Critics (TQC)](https://arxiv.org/abs/2005.04269) + + + + ## Installation +**Note:** You need the `master` version of [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3/). + +To install Stable Baselines3 `master` version: +``` +pip install git+https://github.com/DLR-RM/stable-baselines3 +``` + +Install Stable Baselines3 - Contrib using pip: ``` pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib ``` +## How To Contribute + +If you want to contribute, please read [**CONTRIBUTING.md**](./CONTRIBUTING.md) guide first. + ## Citing the Project diff --git a/docs/README.md b/docs/README.md index 1427a79..e75267f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,6 +1,6 @@ -## Stable Baselines3 Documentation +## Stable Baselines3 Contrib Documentation -This folder contains documentation for the RL baselines. +This folder contains documentation for the RL baselines contribution repository. ### Build the Documentation diff --git a/docs/index.rst b/docs/index.rst index b77a85f..d590f25 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -6,7 +6,7 @@ Welcome to Stable Baselines3 Contrib docs! ========================================== -Contrib package for `Stable Baselines3 `_ - Experimental code. +Contrib package for `Stable Baselines3 (SB3) `_ - Experimental code. Github repository: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib @@ -64,11 +64,7 @@ To cite this project in publications: Contributing ------------ -To any interested in making the rl baselines better, there are still some improvements -that need to be done. -You can check issues in the `repo `_. - -If you want to contribute, please read `CONTRIBUTING.md `_ first. +If you want to contribute, please read `CONTRIBUTING.md `_ first. Indices and tables ------------------- diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst index 6cb9a38..db44b6d 100644 --- a/docs/misc/changelog.rst +++ b/docs/misc/changelog.rst @@ -24,6 +24,8 @@ Others: Documentation: ^^^^^^^^^^^^^^ +- Added initial documentation +- Added contribution guide and related PR templates Maintainers diff --git a/tests/test_run.py b/tests/test_run.py index c2bc941..8d59976 100644 --- a/tests/test_run.py +++ b/tests/test_run.py @@ -14,7 +14,7 @@ def test_tqc(ent_coef): create_eval_env=True, ent_coef=ent_coef, ) - model.learn(total_timesteps=500, eval_freq=250) + model.learn(total_timesteps=300, eval_freq=250) @pytest.mark.parametrize("n_critics", [1, 3]) @@ -23,7 +23,7 @@ def test_n_critics(n_critics): model = TQC( "MlpPolicy", "Pendulum-v0", policy_kwargs=dict(net_arch=[64], n_critics=n_critics), learning_starts=100, verbose=1 ) - model.learn(total_timesteps=500) + model.learn(total_timesteps=300) def test_sde(): @@ -35,6 +35,6 @@ def test_sde(): learning_starts=100, verbose=1, ) - model.learn(total_timesteps=500) + model.learn(total_timesteps=300) model.policy.reset_noise() model.policy.actor.get_std() diff --git a/tests/test_save_load.py b/tests/test_save_load.py index 9a73c77..396bc62 100644 --- a/tests/test_save_load.py +++ b/tests/test_save_load.py @@ -43,7 +43,7 @@ def test_save_load(tmp_path, model_class): # create model model = model_class("MlpPolicy", env, policy_kwargs=dict(net_arch=[16]), verbose=1) - model.learn(total_timesteps=500) + model.learn(total_timesteps=300) env.reset() observations = np.concatenate([env.step([env.action_space.sample()])[0] for _ in range(10)], axis=0) @@ -152,7 +152,7 @@ def test_save_load(tmp_path, model_class): assert np.allclose(selected_actions, new_selected_actions, 1e-4) # check if learn still works - model.learn(total_timesteps=500) + model.learn(total_timesteps=300) del model @@ -224,7 +224,7 @@ def test_save_load_replay_buffer(tmp_path, model_class): path = pathlib.Path(tmp_path / "logs/replay_buffer.pkl") path.parent.mkdir(exist_ok=True, parents=True) # to not raise a warning model = model_class("MlpPolicy", select_env(model_class), buffer_size=1000) - model.learn(500) + model.learn(300) old_replay_buffer = deepcopy(model.replay_buffer) model.save_replay_buffer(path) model.replay_buffer = None @@ -268,7 +268,7 @@ def test_save_load_policy(tmp_path, model_class, policy_str): # create model model = model_class(policy_str, env, policy_kwargs=dict(net_arch=[16]), verbose=1, **kwargs) - model.learn(total_timesteps=500) + model.learn(total_timesteps=300) env.reset() observations = np.concatenate([env.step([env.action_space.sample()])[0] for _ in range(10)], axis=0)