* `to(device)` to `device=device` and `float()` to `dtype=th.float32`
* Update changelog
* Fix type checking
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Modified sb3_contrib/common/maskable/policies.py
- Added support for non-shared features extractor in file sb3_contrib/common/maskable/policies.py
- updated changelog
* Modified sb3_contrib/common/recurrent/policies.py
* Modified sb3_contrib/qrdqn/policies.py and sb3_contrib/tqc/policies.py
* Updated test_cnn.py
* Upgrade SB3 version
* Revert changes in formatting
* Remove duplicate normalize_images
* Add test for image-like inputs
* Fixes and add more tests
* Update SB3 version
* Fix ARS warnings
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Update contribution.md
* New loop struct to make mypy happy
* Update setup.cfg
* Update changelog
* fix squash_output = False in ARS policy
* Add with_bias parameter to ARSPolicy
* Make ARSLinearPolicy a special case of ARSPolicy
* Remove ars_policy from mypy exclude
* Update changelog
* Update SB3 version
* Fix to save ARS linear policy saved with sb3-contrib < 1.7.0
* Fix test
* Turn docstring into comment
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
* Pendulum-v0 -> Pendulum-v1
* Reformat with black
* Update changelog
* Fix dtype bug in TimeFeatureWrapper
* Update version and removed forward calls
* Update CI
* Fix min version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* first pass at ars, replicates initial results, still needs more testing, cleanup
* add a few docs and tests, bugfixes for ARS
* debug and comment
* break out dump logs
* rollback so there are now predict workers, some refactoring
* remove callback from self, remove torch multiprocessing
* add module docs
* run formatter
* fix load and rerun formatter
* rename to less mathy variable names, rename _validate_hypers
* refactor to use evaluatate_policy, linear policy no longer uses bias or squashing
* move everything to torch, add support for discrete action spaces, bugfix for alive reward offset
* added tests, passing all of them, add support for discrete action spaces
* update documentation
* allow for reward offset when there are multiple envs
* update results again
* Reformat
* Ignore unused imports
* Renaming + Cleanup
* Experimental multiprocessing
* Cleaner multiprocessing
* Reformat
* Fixes for callback
* Fix combining stats
* 2nd way
* Make the implementation cpu only
* Fixes + POC with mp module
* POC Processes
* Cleaner aync implementation
* Remove unused arg
* Add typing
* Revert vec normalize offset hack
* Add `squash_output` parameter
* Add more tests
* Add comments
* Update doc
* Add comments
* Add more logging
* Fix TRPO issue on GPU
* Tmp fix for ARS tests on GPU
* Additional tmp fixes for ARS
* update docstrings + formatting, fix bad exceptioe string in ARSPolicy
* Add comments and docstrings
* Fix missing import
* Fix type check
* Add dosctrings
* GPU support, first attempt
* Fix test
* Add missing docstring
* Typos
* Update defaults hyperparameters
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>