Commit Graph

96 Commits

Author SHA1 Message Date
Quentin Gallouédec 36aeae18b5
Fix `Self` return type (#116)
* Self hint for distributions

* ClassSelf to SelfClass
2022-11-22 13:12:35 +01:00
Antonin RAFFIN a9735b9f31
Fix reshape LSTM states (#112)
* Fix LSTM states reshape

* Fix warnings and update changelog

* Remove unused variable

* Fix runtime error when using n_lstm_layers > 1
2022-10-26 18:03:45 +02:00
Antonin RAFFIN c75ad7dd58
Remove deprecated features (#108)
* Remove deprecated features

* Upgrade SB3

* Fix tests
2022-10-11 13:04:18 +02:00
Antonin RAFFIN 52795a307e
Add progress bar argument (#107)
* Add progress bar argument

* Sort imports
2022-10-10 18:44:13 +02:00
Quentin Gallouédec e9c97948c8
Fixed the return type of ``.load()`` methods (#106)
* Fix return type for learn using TypeVar

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-10 17:21:38 +02:00
Quentin Gallouédec dec7b5303a
Deprecate ``create_eval_env``, ``eval_env`` and ``eval_freq`` parameter (#105)
* Deprecate ``eval_env``, ``eval_freq```and ``create_eval_env``

* Update changelog

* Typo

* Raise deprecation warining in _setup_learn

* Upgrade to latest SB3 version and update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-10-10 17:12:40 +02:00
Antonin RAFFIN 2490468b11
Release v1.6.1 (#104) 2022-09-29 12:30:12 +02:00
Honglu Fan cad9034fdb
Handle batch norm in target update (#99)
* Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996

* Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996

* Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996

* roll back test_cnn.py
2022-08-27 12:31:00 +02:00
Quentin Gallouédec 7993b75781
Support `device="auto"`for buffers and set it as default value (#98)
* Default device for buffer is auto

* `device=auto` in ARS

* Undo ARS change

* Update changelog

* Update min SB3 version

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-08-24 09:48:18 +02:00
Burak Demirbilek 049f5a16e9
Fixed missing verbose parameter passing (#97) 2022-08-16 15:54:46 +02:00
CppMaster eb48fec638
Maskable eval callback call callback fix (#93)
* call correctly both self.callback_on_new_best and self.callback - similar as in EvalCallback

* MaskableEvalCallback - updated sync_envs_normalization handling

* MaskableEvalCallback - updated sync_envs_normalization handling - test
MaskablePPO - register policies (tests fails otherwise)

* MaskableEvalCallback - updated docstring

* updated changelog.rst

* changes for stable-baselines3==1.6.0

* version range

* suggested changes

* Reformat and update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-27 19:52:07 +02:00
Max Lodel fc68af8841
Fixed shared_lstm argument in CNN and MultiInput Policies for RecurrentPPO (#90)
* fixed shared_lstm parameter in CNN and MultiInput Policies

* updated tests

* changelog

* Fix FPS for recurrent PPO

* Fix import

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-07-26 00:27:17 +02:00
Adam Gleave 7e687ac47c
Use higher resolution time_ns() and avoid division by zero (#91)
* Use higher resolution time_ns and add max to avoid division by zero

* Add missing imports

* Update changelog
2022-07-25 23:12:20 +02:00
Quentin Gallouédec 3cbd2429be
Fix returned type in predict (#88)
* actions[0] -> actions.squeeze(0)

* Update changelog

* Update changelog

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-07-18 11:49:03 +02:00
Antonin Raffin c9d621b816
Use ICRL url for PPO blog post 2022-07-12 23:49:26 +02:00
Antonin Raffin 5ec9e01b44
Update changelog 2022-07-12 23:15:14 +02:00
Antonin RAFFIN 087951d34b
Release v1.6.0 and bug fix for TRPO (#84) 2022-07-12 23:12:24 +02:00
Antonin RAFFIN db4c0114d0
Update default TQC net arch when using NatureCnn (#79)
* Update default TQC net arch when using NatureCnn

* Bump version
2022-06-18 10:53:29 +02:00
rnederstigt bfa86ce4fe
Fix masked quantities in RecurrentPPO (#78)
* Ignore masked indexes when calculating the loss functions
2022-06-13 16:00:40 +02:00
Antonin RAFFIN 75b2de1399
Recurrent PPO (#53)
* Running (not working yet) version of recurrent PPO

* Fixes for multi envs

* Save WIP, rework the sampling

* Add Box support

* Fix sample order

* Being cleanup, code is broken (again)

* First working version (no shared lstm)

* Start cleanup

* Try rnn with value function

* Re-enable batch size

* Deactivate vf rnn

* Allow any batch size

* Add support for evaluation

* Add CNN support

* Fix start of sequence

* Allow shared LSTM

* Rename mask to episode_start

* Fix type hint

* Enable LSTM for critic

* Clean code

* Fix for CNN LSTM

* Fix sampling with n_layers > 1

* Add std logger

* Update wording

* Rename and add dict obs support

* Fixes for dict obs support

* Do not run slow tests

* Fix doc

* Update recurrent PPO example

* Update README

* Use Pendulum-v1 for tests

* Fix image env

* Speedup LSTM forward pass (#63)

* added more efficient lstm implementation

* Rename and add comment

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

* Fixes

* Remove OpenAI sampling and improve coverage

* Sync with SB3 PPO

* Pass state shape and allow lstm kwargs

* Update tests

* Add masking for padded sequences

* Update default in perf test

* Remove TODO, mask is now working

* Add helper to remove duplicated code, remove hack for padding

* Enable LSTM critic and raise threshold for cartpole with no vel

* Fix tests

* Update doc and tests

* Doc fix

* Fix for new Sphinx version

* Fix doc note

* Switch to batch first, no more additional swap

* Add comments and mask entropy loss

Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com>
2022-05-30 04:31:12 +02:00
Antonin RAFFIN cd592a111f
Upgrade min SB3 version (#70)
* Upgrade min SB3 version

* Fix for newer sphinx version
2022-05-29 21:54:23 +02:00
Antonin RAFFIN bec00386d1
Upgrade to python 3.7+ syntax (#69)
* Upgrade to python 3.7+ syntax

* Switch to PyTorch 1.11
2022-04-25 13:02:07 +02:00
Antonin RAFFIN 812648e6cd
Rename QRDQN logger key (#67) 2022-04-12 12:50:35 +02:00
Grégoire Passault 99853265a9
Using policy_aliases instead of register_policy (#66)
* Using policy_aliases instead of register_policy

* Moving policy_aliases definitions

* Update SB3 version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-08 21:36:23 +02:00
Antonin RAFFIN 9d7e33d213
Release v1.5.0 (#64) 2022-03-25 15:04:53 +01:00
Costa Huang f5c1aaa194
Allow PPO to turn off advantage normalization (#61)
* Allow PPO to turn off advantage normalization

* Quick fix

* Add test cases

* Update docs

* Quick fix

* Quick fix

* Fix sort
2022-02-23 10:11:16 +01:00
Adam Gleave 901a648507
Upgrade Gym to 0.21 (#59)
* Pendulum-v0 -> Pendulum-v1

* Reformat with black

* Update changelog

* Fix dtype bug in TimeFeatureWrapper

* Update version and removed forward calls

* Update CI

* Fix min version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-02-22 16:25:43 +01:00
Antonin Raffin a78891bd00
Update release date 2022-01-19 13:52:30 +01:00
Antonin RAFFIN 89f2bae9f6
Release 1.4.0 (#57)
* Release 1.4.0

* Update requirements
2022-01-19 13:50:56 +01:00
Sean Gillen 675304d8fa
Augmented Random Search (ARS) (#42)
* first pass at ars, replicates initial results, still needs more testing, cleanup

* add a few docs and tests, bugfixes for ARS

* debug and comment

* break out dump logs

* rollback so there are now predict workers, some refactoring

* remove callback from self, remove torch multiprocessing

* add module docs

* run formatter

* fix load and rerun formatter

* rename to less mathy variable names, rename _validate_hypers

* refactor to use evaluatate_policy, linear policy no longer uses bias or squashing

* move everything to torch, add support for discrete action spaces, bugfix for alive reward offset

* added tests, passing all of them, add support for discrete action spaces

* update documentation

* allow for reward offset when there are multiple envs

* update results again

* Reformat

* Ignore unused imports

* Renaming + Cleanup

* Experimental multiprocessing

* Cleaner multiprocessing

* Reformat

* Fixes for callback

* Fix combining stats

* 2nd way

* Make the implementation cpu only

* Fixes + POC with mp module

* POC Processes

* Cleaner aync implementation

* Remove unused arg

* Add typing

* Revert vec normalize offset hack

* Add `squash_output` parameter

* Add more tests

* Add comments

* Update doc

* Add comments

* Add more logging

* Fix TRPO issue on GPU

* Tmp fix for ARS tests on GPU

* Additional tmp fixes for ARS

* update docstrings + formatting, fix bad exceptioe string in ARSPolicy

* Add comments and docstrings

* Fix missing import

* Fix type check

* Add dosctrings

* GPU support, first attempt

* Fix test

* Add missing docstring

* Typos

* Update defaults hyperparameters

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-01-18 13:57:27 +01:00
Antonin Raffin 3b007ae93b
Fix TRPO doc 2021-12-29 15:03:51 +01:00
Cyprien 59be198da0
Add Trust Region Policy Optimization (TRPO) (#40)
* Feat: adding TRPO algorithm (WIP)

WIP - Trust Region Policy Algorithm
Currently the Hessian vector product is not working (see inline comments for more detail)

* Feat: adding TRPO algorithm (WIP)

Adding no_grad block for the line search
Additional assert in the conjugate solver to help debugging

* Feat: adding TRPO algorithm (WIP)

- Adding ActorCriticPolicy.get_distribution
- Using the Distribution object to compute the KL divergence
- Checking for objective improvement in the line search
- Moving magic numbers to instance variables

* Feat: adding TRPO algorithm (WIP)

Improving numerical stability of the conjugate gradient algorithm
Critic updates

* Feat: adding TRPO algorithm (WIP)

Changes around the alpha of the line search
Adding TRPO to __init__ files

* feat: TRPO - addressing PR comments

- renaming cg_solver to conjugate_gradient_solver and renaming parameter Avp_fun to  matrix_vector_dot_func + docstring
- extra comments + better variable names in trpo.py
- defining a method for the hessian vector product instead of an inline function
- fix registering correct policies for TRPO and using correct policy base in constructor

* refactor: TRPO - policier

- refactoring sb3_contrib.common.policies to reuse as much code as possible from sb3

* feat: using updated ActorCriticPolicy from SB3

- get_distribution will be added directly to the SB3 version of ActorCriticPolicy, this commit reflects this

* Bump version for `get_distribution` support

* Add basic test

* Reformat

* [ci skip] Fix changelog

* fix: setting train mode for trpo

* fix: batch_size type hint in trpo.py

* style: renaming variables + docstring in trpo.py

* Rename + cleanup

* Move grad computation to separate method

* Remove grad norm clipping

* Remove n epochs and add sub-sampling

* Update defaults

* Add Doc

* Add more test and fixes for CNN

* Update doc + add benchmark

* Add tests + update doc

* Fix doc

* Improve names for conjugate gradient

* Update comments

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-12-29 11:58:03 +01:00
Antonin RAFFIN b44689b0ea
Update Maskable PPO to match SB3 PPO + improve coverage (#56) 2021-12-10 12:48:19 +01:00
Antonin Raffin 20b5351086
Add color in the tests 2021-12-10 12:38:40 +01:00
Antonin RAFFIN 833669a88b
Drop python 3.6 support (#55)
* Drop python 3.6

* Update setup file
2021-12-06 12:59:53 +01:00
Antonin RAFFIN a1b5ea67ae
Multiprocessing support for off policy algorithms (#50)
* TQC support for multienv

* Add optional layer norm for TQC

* Add layer nprm for all policies

* Revert "Add layer nprm for all policies"

This reverts commit 1306c3c64eb12613464982c66cb416a3bbc66285.

* Revert "Add optional layer norm for TQC"

This reverts commit 200222e3a8878007aa6032d540ae74274a4d0788.

* Add experimental support to train off-policy algorithms with multiple envs

* Bump version

* Update version
2021-12-02 10:40:21 +01:00
Antonin RAFFIN cd0a5e516f
Update citation (#54)
* Update citation

* Fixes for new SB3 version

* Fix type hint

* Additional fixes
2021-12-01 19:09:32 +01:00
Antonin RAFFIN b1397bbb72
Release 1.3.0 (#48) 2021-10-23 17:21:22 +02:00
Geoff McDonald d6c5cea644
MaskablePPO dictionary observation support (#47)
* Add dictionary observation support for ppo_mask.

* Improving naming consistency.

* Update changelog.

* Reformat and add test

* Update doc

* Update README and setup

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-10-23 17:05:37 +02:00
Antonin RAFFIN 91f9b1ed34
Remove sde net arch (#44) 2021-09-28 21:59:59 +02:00
Antonin Raffin c525c5107b Upgrade min sphinx version 2021-09-23 15:26:37 +02:00
kronion ab24f8039f
PPO variant with invalid action masking (#25)
* Add wrappers

* Add maskable distributions

* Add mypy configuration

* Add maskable base datastructures

* Add ppo_mask package

* Fix circular dependency and remove test code that slipped in

* Automatically mask vecenv if env is masked

* Fix debugging change that slipped in

* Workaround for subclassing RolloutBufferSamples

* Duplicate lots of policy code in order to swap out the distributions used

* Fix pytype error

* Maintain py 3.6 compatibility

* Fix isort lint errors

* Use pyproject.toml to configure black line length

* Blacken

* Remove mypy.ini

* Fully replace RolloutBufferSamples

* Drop support for continuous distributions, remove SDE-related code

* Eliminate MaskableAlgorithm and MaskableOnPolicyAlgorithm

* Fix formatting

* Override superclass methods as needed, fix circular import, improve naming

* Fix codestyle

* Eliminate VecActionMasker, replace with utils

* Fix codestyle

* Support masking for MultiDiscrete action spaces

* Fix codestyle

* Don't require the env to provide the mask already flattened

* Consistent naming, prefer 'Maskable' to 'Masked'

* Register policy

* Link to abstract instead of pdf

* Allow distribution masking to be unapplied + improved comments and docstrings

* Don't use deprecated implicit optional typing

* Check codestyle

* Add docstring and remove misplaced TODO

* Simplify env masking API, error if API unmet. Make use_masking a learn() kwarg

* Fix codestyle

* Update various internals to be consistent with latest SB3

* Simplify MaskableRolloutBuffer reset

* Add docstring and type annotations

* Ensure old probs aren't cached

* Fix for new logger

* Add test + fixes

* Start doc

* Fix type annotation

* Remove abstract class + add test

* Fix evaluation (add support for multi envs)

* Handle merge conflicts in documentation

* Bugfix: mask updates should apply to original logits, not the last masked output

* Add test of distribution masking behavior

* Reformat

* Add MultiBinary support, remove unneeded distribution type checks

* Remove unused import

* Fix when using multiple envs

* Remove addressed TODO

* Upgrade for SB3 1.2.0

* Update docs with results + how to replicate

* Add action masker tests, move wrapper tests

* Move distributions, add more distribution tests

* Add MaskablePPO tests, simplify and rename discrete test env

* Address TODO

* Add tests for MaskableMultiCategoricalDistribution, fix distributions

* Add maskable identity envs for all supported action spaces, add tests, fix bug

* Formatting fixes

* Update doc env

* Dict support not ready

* Cleanup

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-23 14:50:10 +02:00
Scott Brownlie b2e7126840
Train/Eval Mode Support (#39)
* switch models between train and eval mode

* update changelog

* update release in change log

* Update dependency

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-08 12:54:50 +02:00
Antonin RAFFIN 36eca8ee79
Fix type annotation + add python 3.9 + citation (#37) 2021-07-29 18:14:03 +02:00
Antonin RAFFIN ae39e00c44
Release v1.1.0 (#34) 2021-07-02 11:38:46 +02:00
Long M. Lưu (刘明龙) fab19bdb18
Update small QR-DQN docs typo (#33)
* Update qrdqn.rst

* Update changelog.rst

* Update changelog.rst

Add my name

* Update changelog.rst

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-06-23 14:34:22 +02:00
Antonin RAFFIN 2258c72215
Update to new logger (#32) 2021-06-14 17:25:08 +02:00
Antonin RAFFIN 08418a3cc8
Bump SB3 version (#30) 2021-05-12 11:46:16 +02:00
Antonin Raffin 30cc206578 Add test for pytorch variables 2021-05-12 11:39:56 +02:00
Antonin RAFFIN 3665695d1e
Dictionary Observations (#29)
* Add TQC support for new HER version

* Add dict obs support

* Add support for dict obs
2021-05-11 13:24:31 +02:00