Go to file

kronion ab24f8039f PPO variant with invalid action masking (#25 ) * Add wrappers * Add maskable distributions * Add mypy configuration * Add maskable base datastructures * Add ppo_mask package * Fix circular dependency and remove test code that slipped in * Automatically mask vecenv if env is masked * Fix debugging change that slipped in * Workaround for subclassing RolloutBufferSamples * Duplicate lots of policy code in order to swap out the distributions used * Fix pytype error * Maintain py 3.6 compatibility * Fix isort lint errors * Use pyproject.toml to configure black line length * Blacken * Remove mypy.ini * Fully replace RolloutBufferSamples * Drop support for continuous distributions, remove SDE-related code * Eliminate MaskableAlgorithm and MaskableOnPolicyAlgorithm * Fix formatting * Override superclass methods as needed, fix circular import, improve naming * Fix codestyle * Eliminate VecActionMasker, replace with utils * Fix codestyle * Support masking for MultiDiscrete action spaces * Fix codestyle * Don't require the env to provide the mask already flattened * Consistent naming, prefer 'Maskable' to 'Masked' * Register policy * Link to abstract instead of pdf * Allow distribution masking to be unapplied + improved comments and docstrings * Don't use deprecated implicit optional typing * Check codestyle * Add docstring and remove misplaced TODO * Simplify env masking API, error if API unmet. Make use_masking a learn() kwarg * Fix codestyle * Update various internals to be consistent with latest SB3 * Simplify MaskableRolloutBuffer reset * Add docstring and type annotations * Ensure old probs aren't cached * Fix for new logger * Add test + fixes * Start doc * Fix type annotation * Remove abstract class + add test * Fix evaluation (add support for multi envs) * Handle merge conflicts in documentation * Bugfix: mask updates should apply to original logits, not the last masked output * Add test of distribution masking behavior * Reformat * Add MultiBinary support, remove unneeded distribution type checks * Remove unused import * Fix when using multiple envs * Remove addressed TODO * Upgrade for SB3 1.2.0 * Update docs with results + how to replicate * Add action masker tests, move wrapper tests * Move distributions, add more distribution tests * Add MaskablePPO tests, simplify and rename discrete test env * Address TODO * Add tests for MaskableMultiCategoricalDistribution, fix distributions * Add maskable identity envs for all supported action spaces, add tests, fix bug * Formatting fixes * Update doc env * Dict support not ready * Cleanup Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>		2021-09-23 14:50:10 +02:00
.github	Fix type annotation + add python 3.9 + citation (#37 )	2021-07-29 18:14:03 +02:00
docs	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
sb3_contrib	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
scripts	Update script permissions	2020-09-25 12:53:13 +02:00
tests	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
.coveragerc	Add TQC and base scripts	2020-09-25 12:47:45 +02:00
.gitignore	Add TQC and base scripts	2020-09-25 12:47:45 +02:00
.readthedocs.yml	Fix doc build	2020-10-22 14:46:05 +02:00
CITATION.cff	Fix type annotation + add python 3.9 + citation (#37 )	2021-07-29 18:14:03 +02:00
CONTRIBUTING.md	Fix features extractor issue (#5 )	2020-10-27 14:30:35 +01:00
LICENSE	Initial commit	2020-09-20 22:09:57 +02:00
Makefile	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
README.md	Update README	2021-02-06 17:13:44 +01:00
pyproject.toml	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
setup.cfg	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
setup.py	Train/Eval Mode Support (#39 )	2021-09-08 12:54:50 +02:00

README.md

Stable-Baselines3 - Contrib (SB3-Contrib)

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code. "sb3-contrib" for short.

What is SB3-Contrib?

A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations.

Why create this repository?

Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms.

However sometimes these utilities were too niche to be considered for stable-baselines or proved to be too difficult to integrate well into the existing code without creating a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows us to provide reliable implementations following stable-baselines usual standards (consistent style, documentation, etc) beyond the relatively small scope of utilities in the main repository.

Features

See documentation for the full list of included features.

RL Algorithms:

Gym Wrappers:

Time Feature Wrapper

Documentation

Documentation is available online: https://sb3-contrib.readthedocs.io/

Installation

To install Stable Baselines3 contrib with pip, execute:

pip install sb3-contrib

We recommend to use the master version of Stable Baselines3.

To install Stable Baselines3 master version:

pip install git+https://github.com/DLR-RM/stable-baselines3

To install Stable Baselines3 contrib master version:

pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

How To Contribute

If you want to contribute, please read CONTRIBUTING.md guide first.

Citing the Project

To cite this repository in publications (please cite SB3 directly):

@misc{stable-baselines3,
  author = {Raffin, Antonin and Hill, Ashley and Ernestus, Maximilian and Gleave, Adam and Kanervisto, Anssi and Dormann, Noah},
  title = {Stable Baselines3},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/DLR-RM/stable-baselines3}},
}