* Default device for buffer is auto
* `device=auto` in ARS
* Undo ARS change
* Update changelog
* Update min SB3 version
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
* Running (not working yet) version of recurrent PPO
* Fixes for multi envs
* Save WIP, rework the sampling
* Add Box support
* Fix sample order
* Being cleanup, code is broken (again)
* First working version (no shared lstm)
* Start cleanup
* Try rnn with value function
* Re-enable batch size
* Deactivate vf rnn
* Allow any batch size
* Add support for evaluation
* Add CNN support
* Fix start of sequence
* Allow shared LSTM
* Rename mask to episode_start
* Fix type hint
* Enable LSTM for critic
* Clean code
* Fix for CNN LSTM
* Fix sampling with n_layers > 1
* Add std logger
* Update wording
* Rename and add dict obs support
* Fixes for dict obs support
* Do not run slow tests
* Fix doc
* Update recurrent PPO example
* Update README
* Use Pendulum-v1 for tests
* Fix image env
* Speedup LSTM forward pass (#63)
* added more efficient lstm implementation
* Rename and add comment
Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
* Fixes
* Remove OpenAI sampling and improve coverage
* Sync with SB3 PPO
* Pass state shape and allow lstm kwargs
* Update tests
* Add masking for padded sequences
* Update default in perf test
* Remove TODO, mask is now working
* Add helper to remove duplicated code, remove hack for padding
* Enable LSTM critic and raise threshold for cartpole with no vel
* Fix tests
* Update doc and tests
* Doc fix
* Fix for new Sphinx version
* Fix doc note
* Switch to batch first, no more additional swap
* Add comments and mask entropy loss
Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com>
* Add wrappers
* Add maskable distributions
* Add mypy configuration
* Add maskable base datastructures
* Add ppo_mask package
* Fix circular dependency and remove test code that slipped in
* Automatically mask vecenv if env is masked
* Fix debugging change that slipped in
* Workaround for subclassing RolloutBufferSamples
* Duplicate lots of policy code in order to swap out the distributions used
* Fix pytype error
* Maintain py 3.6 compatibility
* Fix isort lint errors
* Use pyproject.toml to configure black line length
* Blacken
* Remove mypy.ini
* Fully replace RolloutBufferSamples
* Drop support for continuous distributions, remove SDE-related code
* Eliminate MaskableAlgorithm and MaskableOnPolicyAlgorithm
* Fix formatting
* Override superclass methods as needed, fix circular import, improve naming
* Fix codestyle
* Eliminate VecActionMasker, replace with utils
* Fix codestyle
* Support masking for MultiDiscrete action spaces
* Fix codestyle
* Don't require the env to provide the mask already flattened
* Consistent naming, prefer 'Maskable' to 'Masked'
* Register policy
* Link to abstract instead of pdf
* Allow distribution masking to be unapplied + improved comments and docstrings
* Don't use deprecated implicit optional typing
* Check codestyle
* Add docstring and remove misplaced TODO
* Simplify env masking API, error if API unmet. Make use_masking a learn() kwarg
* Fix codestyle
* Update various internals to be consistent with latest SB3
* Simplify MaskableRolloutBuffer reset
* Add docstring and type annotations
* Ensure old probs aren't cached
* Fix for new logger
* Add test + fixes
* Start doc
* Fix type annotation
* Remove abstract class + add test
* Fix evaluation (add support for multi envs)
* Handle merge conflicts in documentation
* Bugfix: mask updates should apply to original logits, not the last masked output
* Add test of distribution masking behavior
* Reformat
* Add MultiBinary support, remove unneeded distribution type checks
* Remove unused import
* Fix when using multiple envs
* Remove addressed TODO
* Upgrade for SB3 1.2.0
* Update docs with results + how to replicate
* Add action masker tests, move wrapper tests
* Move distributions, add more distribution tests
* Add MaskablePPO tests, simplify and rename discrete test env
* Address TODO
* Add tests for MaskableMultiCategoricalDistribution, fix distributions
* Add maskable identity envs for all supported action spaces, add tests, fix bug
* Formatting fixes
* Update doc env
* Dict support not ready
* Cleanup
Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>