* Pendulum-v0 -> Pendulum-v1
* Reformat with black
* Update changelog
* Fix dtype bug in TimeFeatureWrapper
* Update version and removed forward calls
* Update CI
* Fix min version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* first pass at ars, replicates initial results, still needs more testing, cleanup
* add a few docs and tests, bugfixes for ARS
* debug and comment
* break out dump logs
* rollback so there are now predict workers, some refactoring
* remove callback from self, remove torch multiprocessing
* add module docs
* run formatter
* fix load and rerun formatter
* rename to less mathy variable names, rename _validate_hypers
* refactor to use evaluatate_policy, linear policy no longer uses bias or squashing
* move everything to torch, add support for discrete action spaces, bugfix for alive reward offset
* added tests, passing all of them, add support for discrete action spaces
* update documentation
* allow for reward offset when there are multiple envs
* update results again
* Reformat
* Ignore unused imports
* Renaming + Cleanup
* Experimental multiprocessing
* Cleaner multiprocessing
* Reformat
* Fixes for callback
* Fix combining stats
* 2nd way
* Make the implementation cpu only
* Fixes + POC with mp module
* POC Processes
* Cleaner aync implementation
* Remove unused arg
* Add typing
* Revert vec normalize offset hack
* Add `squash_output` parameter
* Add more tests
* Add comments
* Update doc
* Add comments
* Add more logging
* Fix TRPO issue on GPU
* Tmp fix for ARS tests on GPU
* Additional tmp fixes for ARS
* update docstrings + formatting, fix bad exceptioe string in ARSPolicy
* Add comments and docstrings
* Fix missing import
* Fix type check
* Add dosctrings
* GPU support, first attempt
* Fix test
* Add missing docstring
* Typos
* Update defaults hyperparameters
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>