* Add support for Gym 0.24
* Fixes for gym 0.24
* Fix for new reset signature
* Add tmp SB3 branch
* Fixes for gym 0.26
* Remove unused import
* Fix dependency
* Type annotations fixes
* Reformat
* Reformat with black 23
* Move to gymnasium
* Patch env if needed
* Fix types
* Fix CI
* Fixes for gymnasium
* Fix wrapper annotations
* Update version
* Fix type check
* Update QRDQN type hints and bug fix with multi envs
* Fix TQC type hints
* Fix TRPO type hints
* Additional fixes
* Update SB3 version
* Update issue templates and CI
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* MaskablePPO docs
Added a warning about possible crashes caused by chack_env in case of invalid actions.
* Reformat with black 23
* Rephrase note on action sampling
* Fix action noise
* Update changelog
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Update contribution.md
* New loop struct to make mypy happy
* Update setup.cfg
* Update changelog
* fix squash_output = False in ARS policy
* Add with_bias parameter to ARSPolicy
* Make ARSLinearPolicy a special case of ARSPolicy
* Remove ars_policy from mypy exclude
* Update changelog
* Update SB3 version
* Fix to save ARS linear policy saved with sb3-contrib < 1.7.0
* Fix test
* Turn docstring into comment
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* first pass at ars, replicates initial results, still needs more testing, cleanup
* add a few docs and tests, bugfixes for ARS
* debug and comment
* break out dump logs
* rollback so there are now predict workers, some refactoring
* remove callback from self, remove torch multiprocessing
* add module docs
* run formatter
* fix load and rerun formatter
* rename to less mathy variable names, rename _validate_hypers
* refactor to use evaluatate_policy, linear policy no longer uses bias or squashing
* move everything to torch, add support for discrete action spaces, bugfix for alive reward offset
* added tests, passing all of them, add support for discrete action spaces
* update documentation
* allow for reward offset when there are multiple envs
* update results again
* Reformat
* Ignore unused imports
* Renaming + Cleanup
* Experimental multiprocessing
* Cleaner multiprocessing
* Reformat
* Fixes for callback
* Fix combining stats
* 2nd way
* Make the implementation cpu only
* Fixes + POC with mp module
* POC Processes
* Cleaner aync implementation
* Remove unused arg
* Add typing
* Revert vec normalize offset hack
* Add `squash_output` parameter
* Add more tests
* Add comments
* Update doc
* Add comments
* Add more logging
* Fix TRPO issue on GPU
* Tmp fix for ARS tests on GPU
* Additional tmp fixes for ARS
* update docstrings + formatting, fix bad exceptioe string in ARSPolicy
* Add comments and docstrings
* Fix missing import
* Fix type check
* Add dosctrings
* GPU support, first attempt
* Fix test
* Add missing docstring
* Typos
* Update defaults hyperparameters
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>