Commit Graph

3 Commits

Author SHA1 Message Date
kronion ab24f8039f
PPO variant with invalid action masking (#25)
* Add wrappers

* Add maskable distributions

* Add mypy configuration

* Add maskable base datastructures

* Add ppo_mask package

* Fix circular dependency and remove test code that slipped in

* Automatically mask vecenv if env is masked

* Fix debugging change that slipped in

* Workaround for subclassing RolloutBufferSamples

* Duplicate lots of policy code in order to swap out the distributions used

* Fix pytype error

* Maintain py 3.6 compatibility

* Fix isort lint errors

* Use pyproject.toml to configure black line length

* Blacken

* Remove mypy.ini

* Fully replace RolloutBufferSamples

* Drop support for continuous distributions, remove SDE-related code

* Eliminate MaskableAlgorithm and MaskableOnPolicyAlgorithm

* Fix formatting

* Override superclass methods as needed, fix circular import, improve naming

* Fix codestyle

* Eliminate VecActionMasker, replace with utils

* Fix codestyle

* Support masking for MultiDiscrete action spaces

* Fix codestyle

* Don't require the env to provide the mask already flattened

* Consistent naming, prefer 'Maskable' to 'Masked'

* Register policy

* Link to abstract instead of pdf

* Allow distribution masking to be unapplied + improved comments and docstrings

* Don't use deprecated implicit optional typing

* Check codestyle

* Add docstring and remove misplaced TODO

* Simplify env masking API, error if API unmet. Make use_masking a learn() kwarg

* Fix codestyle

* Update various internals to be consistent with latest SB3

* Simplify MaskableRolloutBuffer reset

* Add docstring and type annotations

* Ensure old probs aren't cached

* Fix for new logger

* Add test + fixes

* Start doc

* Fix type annotation

* Remove abstract class + add test

* Fix evaluation (add support for multi envs)

* Handle merge conflicts in documentation

* Bugfix: mask updates should apply to original logits, not the last masked output

* Add test of distribution masking behavior

* Reformat

* Add MultiBinary support, remove unneeded distribution type checks

* Remove unused import

* Fix when using multiple envs

* Remove addressed TODO

* Upgrade for SB3 1.2.0

* Update docs with results + how to replicate

* Add action masker tests, move wrapper tests

* Move distributions, add more distribution tests

* Add MaskablePPO tests, simplify and rename discrete test env

* Address TODO

* Add tests for MaskableMultiCategoricalDistribution, fix distributions

* Add maskable identity envs for all supported action spaces, add tests, fix bug

* Formatting fixes

* Update doc env

* Dict support not ready

* Cleanup

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-09-23 14:50:10 +02:00
Toshiki Watanabe b30397fff5
Add QR-DQN (#13)
* Add QR-DQN(WIP)

* Update docstring

* Add quantile_huber_loss

* Fix typo

* Remove unnecessary lines

* Update variable names and comments in quantile_huber_loss

* Fix mutable arguments

* Update variable names

* Ignore import not used warnings

* Fix default parameter of optimizer in QR-DQN

* Update quantile_huber_loss to have more reasonable interface

* update tests

* Add assertion to quantile_huber_loss

* Update variable names of quantile regression

* Update comments

* Reduce the number of quantiles during test

* Update comment

* Update quantile_huber_loss

* Fix isort

* Add document of QR-DQN without results

* Update docs

* Fix bugs

* Update doc

* Add comments about shape

* Minor edits

* Update comments

* Add benchmark

* Doc fixes

* Update doc

* Bug fix in saving/loading + update tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-12-21 11:17:48 +01:00
Antonin RAFFIN 0d9f2e229e Add TQC and base scripts 2020-09-25 12:47:45 +02:00