Commit Graph

33 Commits

Author SHA1 Message Date
Antonin RAFFIN a1b5ea67ae
Multiprocessing support for off policy algorithms (#50)
* TQC support for multienv

* Add optional layer norm for TQC

* Add layer nprm for all policies

* Revert "Add layer nprm for all policies"

This reverts commit 1306c3c64eb12613464982c66cb416a3bbc66285.

* Revert "Add optional layer norm for TQC"

This reverts commit 200222e3a8878007aa6032d540ae74274a4d0788.

* Add experimental support to train off-policy algorithms with multiple envs

* Bump version

* Update version
2021-12-02 10:40:21 +01:00
Antonin RAFFIN cd0a5e516f
Update citation (#54)
* Update citation

* Fixes for new SB3 version

* Fix type hint

* Additional fixes
2021-12-01 19:09:32 +01:00
Antonin RAFFIN b1397bbb72
Release 1.3.0 (#48) 2021-10-23 17:21:22 +02:00
Geoff McDonald d6c5cea644
MaskablePPO dictionary observation support (#47)
* Add dictionary observation support for ppo_mask.

* Improving naming consistency.

* Update changelog.

* Reformat and add test

* Update doc

* Update README and setup

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-10-23 17:05:37 +02:00
Antonin RAFFIN 91f9b1ed34
Remove sde net arch (#44) 2021-09-28 21:59:59 +02:00
Antonin Raffin c525c5107b Upgrade min sphinx version 2021-09-23 15:26:37 +02:00
kronion ab24f8039f
PPO variant with invalid action masking (#25)
* Add wrappers

* Add maskable distributions

* Add mypy configuration

* Add maskable base datastructures

* Add ppo_mask package

* Fix circular dependency and remove test code that slipped in

* Automatically mask vecenv if env is masked

* Fix debugging change that slipped in

* Workaround for subclassing RolloutBufferSamples

* Duplicate lots of policy code in order to swap out the distributions used

* Fix pytype error

* Maintain py 3.6 compatibility

* Fix isort lint errors

* Use pyproject.toml to configure black line length

* Blacken

* Remove mypy.ini

* Fully replace RolloutBufferSamples

* Drop support for continuous distributions, remove SDE-related code

* Eliminate MaskableAlgorithm and MaskableOnPolicyAlgorithm

* Fix formatting

* Override superclass methods as needed, fix circular import, improve naming

* Fix codestyle

* Eliminate VecActionMasker, replace with utils

* Fix codestyle

* Support masking for MultiDiscrete action spaces

* Fix codestyle

* Don't require the env to provide the mask already flattened

* Consistent naming, prefer 'Maskable' to 'Masked'

* Register policy

* Link to abstract instead of pdf

* Allow distribution masking to be unapplied + improved comments and docstrings

* Don't use deprecated implicit optional typing

* Check codestyle

* Add docstring and remove misplaced TODO

* Simplify env masking API, error if API unmet. Make use_masking a learn() kwarg

* Fix codestyle

* Update various internals to be consistent with latest SB3

* Simplify MaskableRolloutBuffer reset

* Add docstring and type annotations

* Ensure old probs aren't cached

* Fix for new logger

* Add test + fixes

* Start doc

* Fix type annotation

* Remove abstract class + add test

* Fix evaluation (add support for multi envs)

* Handle merge conflicts in documentation

* Bugfix: mask updates should apply to original logits, not the last masked output

* Add test of distribution masking behavior

* Reformat

* Add MultiBinary support, remove unneeded distribution type checks

* Remove unused import

* Fix when using multiple envs

* Remove addressed TODO

* Upgrade for SB3 1.2.0

* Update docs with results + how to replicate

* Add action masker tests, move wrapper tests

* Move distributions, add more distribution tests

* Add MaskablePPO tests, simplify and rename discrete test env

* Address TODO

* Add tests for MaskableMultiCategoricalDistribution, fix distributions

* Add maskable identity envs for all supported action spaces, add tests, fix bug

* Formatting fixes

* Update doc env

* Dict support not ready

* Cleanup

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-23 14:50:10 +02:00
Scott Brownlie b2e7126840
Train/Eval Mode Support (#39)
* switch models between train and eval mode

* update changelog

* update release in change log

* Update dependency

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-08 12:54:50 +02:00
Antonin RAFFIN 36eca8ee79
Fix type annotation + add python 3.9 + citation (#37) 2021-07-29 18:14:03 +02:00
Antonin RAFFIN ae39e00c44
Release v1.1.0 (#34) 2021-07-02 11:38:46 +02:00
Long M. Lưu (刘明龙) fab19bdb18
Update small QR-DQN docs typo (#33)
* Update qrdqn.rst

* Update changelog.rst

* Update changelog.rst

Add my name

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-06-23 14:34:22 +02:00
Antonin RAFFIN 2258c72215
Update to new logger (#32) 2021-06-14 17:25:08 +02:00
Antonin RAFFIN 08418a3cc8
Bump SB3 version (#30) 2021-05-12 11:46:16 +02:00
Antonin RAFFIN 3665695d1e
Dictionary Observations (#29)
* Add TQC support for new HER version

* Add dict obs support

* Add support for dict obs
2021-05-11 13:24:31 +02:00
Antonin RAFFIN 61bfdbc00a
Fix unused code (#28)
* Fix unused code

* Update changelog

* Update SB3 dependency
2021-05-05 11:42:10 +02:00
Antonin RAFFIN 81ef23d270
SB3 v1.0 (#23) 2021-03-17 14:32:58 +01:00
Antonin RAFFIN 9824daca44
Bug fix for QR-DQN (#21)
* Bug fix for QR-DQN

* Upgrade SB3
2021-03-06 14:54:43 +01:00
Antonin RAFFIN 7c2eb833c0
Upgrade SB3 (#20) 2021-02-27 19:59:21 +01:00
Antonin RAFFIN 74e60381a6
Upgrade Stable-Baselines3 (#19)
* Upgrade Stable-Baselines3

* Fix policy saving/loading
2021-02-27 18:17:22 +01:00
Toshiki Watanabe 4b4d487fdb
Fix the target calculation of QR-DQN (#18)
* Fix the target calculation of QR-DQN

* Update doc

* Update version

* Update changelog

* Update README

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-01-11 14:11:16 +01:00
Antonin RAFFIN ab2880c670 Version bump 2020-12-21 11:19:31 +01:00
Toshiki Watanabe b30397fff5
Add QR-DQN (#13)
* Add QR-DQN(WIP)

* Update docstring

* Add quantile_huber_loss

* Fix typo

* Remove unnecessary lines

* Update variable names and comments in quantile_huber_loss

* Fix mutable arguments

* Update variable names

* Ignore import not used warnings

* Fix default parameter of optimizer in QR-DQN

* Update quantile_huber_loss to have more reasonable interface

* update tests

* Add assertion to quantile_huber_loss

* Update variable names of quantile regression

* Update comments

* Reduce the number of quantiles during test

* Update comment

* Update quantile_huber_loss

* Fix isort

* Add document of QR-DQN without results

* Update docs

* Fix bugs

* Update doc

* Add comments about shape

* Minor edits

* Update comments

* Add benchmark

* Doc fixes

* Update doc

* Bug fix in saving/loading + update tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-12-21 11:17:48 +01:00
Antonin RAFFIN 3598ca284a
Update requirements (#15) 2020-12-13 17:29:15 +01:00
Antonin RAFFIN 857a087a2a
Update TQC to match SB3 (#14) 2020-12-08 15:35:50 +01:00
Antonin RAFFIN 6bafcf6e88
Add TimeFeatureWrapper (#7)
* Add TimeFeatureWrapper

* Update README

* Address comments
2020-11-13 13:00:56 +02:00
Antonin RAFFIN aac20bd1e6 Release v0.10.0 2020-10-28 15:08:07 +01:00
Antonin RAFFIN 2ce8d278cc
Fix features extractor issue (#5)
* Fix feature extractor issue

* Sync with SB3 PR
2020-10-27 14:30:35 +01:00
Antonin RAFFIN b896b7492e Update dependencies 2020-10-22 16:35:28 +02:00
Antonin RAFFIN e8093965c7 Fix doc build 2020-10-22 14:46:05 +02:00
Antonin RAFFIN 0700c3eeb0
Add TQC (#4)
* Add TQC doc

* Polish code

* Update doc

* Update results

* Update doc

* Update doc

* Add note about PyBullet envs
2020-10-22 13:43:46 +02:00
Antonin RAFFIN 926e488196 Update wording and links 2020-10-17 17:04:00 +02:00
Anssi "Miffyli" Kanervisto 79fcf54e1e Review docs and update changelog 2020-10-15 02:17:36 +03:00
Antonin RAFFIN 5033b192cb Add base doc 2020-10-12 20:21:52 +02:00