Commit Graph

14 Commits

Author SHA1 Message Date
Antonin RAFFIN 52795a307e
Add progress bar argument (#107)
* Add progress bar argument

* Sort imports
2022-10-10 18:44:13 +02:00
Quentin Gallouédec dec7b5303a
Deprecate ``create_eval_env``, ``eval_env`` and ``eval_freq`` parameter (#105)
* Deprecate ``eval_env``, ``eval_freq```and ``create_eval_env``

* Update changelog

* Typo

* Raise deprecation warining in _setup_learn

* Upgrade to latest SB3 version and update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-10-10 17:12:40 +02:00
Costa Huang f5c1aaa194
Allow PPO to turn off advantage normalization (#61)
* Allow PPO to turn off advantage normalization

* Quick fix

* Add test cases

* Update docs

* Quick fix

* Quick fix

* Fix sort
2022-02-23 10:11:16 +01:00
Adam Gleave 901a648507
Upgrade Gym to 0.21 (#59)
* Pendulum-v0 -> Pendulum-v1

* Reformat with black

* Update changelog

* Fix dtype bug in TimeFeatureWrapper

* Update version and removed forward calls

* Update CI

* Fix min version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-02-22 16:25:43 +01:00
Sean Gillen 675304d8fa
Augmented Random Search (ARS) (#42)
* first pass at ars, replicates initial results, still needs more testing, cleanup

* add a few docs and tests, bugfixes for ARS

* debug and comment

* break out dump logs

* rollback so there are now predict workers, some refactoring

* remove callback from self, remove torch multiprocessing

* add module docs

* run formatter

* fix load and rerun formatter

* rename to less mathy variable names, rename _validate_hypers

* refactor to use evaluatate_policy, linear policy no longer uses bias or squashing

* move everything to torch, add support for discrete action spaces, bugfix for alive reward offset

* added tests, passing all of them, add support for discrete action spaces

* update documentation

* allow for reward offset when there are multiple envs

* update results again

* Reformat

* Ignore unused imports

* Renaming + Cleanup

* Experimental multiprocessing

* Cleaner multiprocessing

* Reformat

* Fixes for callback

* Fix combining stats

* 2nd way

* Make the implementation cpu only

* Fixes + POC with mp module

* POC Processes

* Cleaner aync implementation

* Remove unused arg

* Add typing

* Revert vec normalize offset hack

* Add `squash_output` parameter

* Add more tests

* Add comments

* Update doc

* Add comments

* Add more logging

* Fix TRPO issue on GPU

* Tmp fix for ARS tests on GPU

* Additional tmp fixes for ARS

* update docstrings + formatting, fix bad exceptioe string in ARSPolicy

* Add comments and docstrings

* Fix missing import

* Fix type check

* Add dosctrings

* GPU support, first attempt

* Fix test

* Add missing docstring

* Typos

* Update defaults hyperparameters

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-01-18 13:57:27 +01:00
Cyprien 59be198da0
Add Trust Region Policy Optimization (TRPO) (#40)
* Feat: adding TRPO algorithm (WIP)

WIP - Trust Region Policy Algorithm
Currently the Hessian vector product is not working (see inline comments for more detail)

* Feat: adding TRPO algorithm (WIP)

Adding no_grad block for the line search
Additional assert in the conjugate solver to help debugging

* Feat: adding TRPO algorithm (WIP)

- Adding ActorCriticPolicy.get_distribution
- Using the Distribution object to compute the KL divergence
- Checking for objective improvement in the line search
- Moving magic numbers to instance variables

* Feat: adding TRPO algorithm (WIP)

Improving numerical stability of the conjugate gradient algorithm
Critic updates

* Feat: adding TRPO algorithm (WIP)

Changes around the alpha of the line search
Adding TRPO to __init__ files

* feat: TRPO - addressing PR comments

- renaming cg_solver to conjugate_gradient_solver and renaming parameter Avp_fun to  matrix_vector_dot_func + docstring
- extra comments + better variable names in trpo.py
- defining a method for the hessian vector product instead of an inline function
- fix registering correct policies for TRPO and using correct policy base in constructor

* refactor: TRPO - policier

- refactoring sb3_contrib.common.policies to reuse as much code as possible from sb3

* feat: using updated ActorCriticPolicy from SB3

- get_distribution will be added directly to the SB3 version of ActorCriticPolicy, this commit reflects this

* Bump version for `get_distribution` support

* Add basic test

* Reformat

* [ci skip] Fix changelog

* fix: setting train mode for trpo

* fix: batch_size type hint in trpo.py

* style: renaming variables + docstring in trpo.py

* Rename + cleanup

* Move grad computation to separate method

* Remove grad norm clipping

* Remove n epochs and add sub-sampling

* Update defaults

* Add Doc

* Add more test and fixes for CNN

* Update doc + add benchmark

* Add tests + update doc

* Fix doc

* Improve names for conjugate gradient

* Update comments

* Update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-12-29 11:58:03 +01:00
Antonin RAFFIN a1b5ea67ae
Multiprocessing support for off policy algorithms (#50)
* TQC support for multienv

* Add optional layer norm for TQC

* Add layer nprm for all policies

* Revert "Add layer nprm for all policies"

This reverts commit 1306c3c64eb12613464982c66cb416a3bbc66285.

* Revert "Add optional layer norm for TQC"

This reverts commit 200222e3a8878007aa6032d540ae74274a4d0788.

* Add experimental support to train off-policy algorithms with multiple envs

* Bump version

* Update version
2021-12-02 10:40:21 +01:00
Antonin RAFFIN 91f9b1ed34
Remove sde net arch (#44) 2021-09-28 21:59:59 +02:00
Toshiki Watanabe b30397fff5
Add QR-DQN (#13)
* Add QR-DQN(WIP)

* Update docstring

* Add quantile_huber_loss

* Fix typo

* Remove unnecessary lines

* Update variable names and comments in quantile_huber_loss

* Fix mutable arguments

* Update variable names

* Ignore import not used warnings

* Fix default parameter of optimizer in QR-DQN

* Update quantile_huber_loss to have more reasonable interface

* update tests

* Add assertion to quantile_huber_loss

* Update variable names of quantile regression

* Update comments

* Reduce the number of quantiles during test

* Update comment

* Update quantile_huber_loss

* Fix isort

* Add document of QR-DQN without results

* Update docs

* Fix bugs

* Update doc

* Add comments about shape

* Minor edits

* Update comments

* Add benchmark

* Doc fixes

* Update doc

* Bug fix in saving/loading + update tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-12-21 11:17:48 +01:00
Antonin RAFFIN 72fe9a2072 Faster tests 2020-10-17 17:06:11 +02:00
Antonin RAFFIN afe7b132e4 Lint 2020-10-12 20:25:11 +02:00
Antonin RAFFIN 5d7b79d41a Improve coverage 2020-10-12 20:17:33 +02:00
Antonin RAFFIN 7609c87e84 Cleanup TQC 2020-10-12 19:50:08 +02:00
Antonin RAFFIN 0d9f2e229e Add TQC and base scripts 2020-09-25 12:47:45 +02:00