* Add support for Gym 0.24
* Fixes for gym 0.24
* Fix for new reset signature
* Add tmp SB3 branch
* Fixes for gym 0.26
* Remove unused import
* Fix dependency
* Type annotations fixes
* Reformat
* Reformat with black 23
* Move to gymnasium
* Patch env if needed
* Fix types
* Fix CI
* Fixes for gymnasium
* Fix wrapper annotations
* Update version
* Fix type check
* Update QRDQN type hints and bug fix with multi envs
* Fix TQC type hints
* Fix TRPO type hints
* Additional fixes
* Update SB3 version
* Update issue templates and CI
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Issue forms and pyproject.toml
* [ci skip] Fix typos
* Fix isort config
* Use secret link to download atari roms
* Fix for mypy and update config
* Upgrade SB3 and fix warnings
* Fix doc build
* Update Makefile
* Lint first
* MaskablePPO docs
Added a warning about possible crashes caused by chack_env in case of invalid actions.
* Reformat with black 23
* Rephrase note on action sampling
* Fix action noise
* Update changelog
---------
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Modified sb3_contrib/common/maskable/policies.py
- Added support for non-shared features extractor in file sb3_contrib/common/maskable/policies.py
- updated changelog
* Modified sb3_contrib/common/recurrent/policies.py
* Modified sb3_contrib/qrdqn/policies.py and sb3_contrib/tqc/policies.py
* Updated test_cnn.py
* Upgrade SB3 version
* Revert changes in formatting
* Remove duplicate normalize_images
* Add test for image-like inputs
* Fixes and add more tests
* Update SB3 version
* Fix ARS warnings
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Running (not working yet) version of recurrent PPO
* Fixes for multi envs
* Save WIP, rework the sampling
* Add Box support
* Fix sample order
* Being cleanup, code is broken (again)
* First working version (no shared lstm)
* Start cleanup
* Try rnn with value function
* Re-enable batch size
* Deactivate vf rnn
* Allow any batch size
* Add support for evaluation
* Add CNN support
* Fix start of sequence
* Allow shared LSTM
* Rename mask to episode_start
* Fix type hint
* Enable LSTM for critic
* Clean code
* Fix for CNN LSTM
* Fix sampling with n_layers > 1
* Add std logger
* Update wording
* Rename and add dict obs support
* Fixes for dict obs support
* Do not run slow tests
* Fix doc
* Update recurrent PPO example
* Update README
* Use Pendulum-v1 for tests
* Fix image env
* Speedup LSTM forward pass (#63)
* added more efficient lstm implementation
* Rename and add comment
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fixes
* Remove OpenAI sampling and improve coverage
* Sync with SB3 PPO
* Pass state shape and allow lstm kwargs
* Update tests
* Add masking for padded sequences
* Update default in perf test
* Remove TODO, mask is now working
* Add helper to remove duplicated code, remove hack for padding
* Enable LSTM critic and raise threshold for cartpole with no vel
* Fix tests
* Update doc and tests
* Doc fix
* Fix for new Sphinx version
* Fix doc note
* Switch to batch first, no more additional swap
* Add comments and mask entropy loss
Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com>
* Pendulum-v0 -> Pendulum-v1
* Reformat with black
* Update changelog
* Fix dtype bug in TimeFeatureWrapper
* Update version and removed forward calls
* Update CI
* Fix min version
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* TQC support for multienv
* Add optional layer norm for TQC
* Add layer nprm for all policies
* Revert "Add layer nprm for all policies"
This reverts commit 1306c3c64eb12613464982c66cb416a3bbc66285.
* Revert "Add optional layer norm for TQC"
This reverts commit 200222e3a8878007aa6032d540ae74274a4d0788.
* Add experimental support to train off-policy algorithms with multiple envs
* Bump version
* Update version