stable-baselines3-contrib-sacd

Commit Graph

Author	SHA1	Message	Date
Antonin RAFFIN	a1b5ea67ae	Multiprocessing support for off policy algorithms (#50 ) * TQC support for multienv * Add optional layer norm for TQC * Add layer nprm for all policies * Revert "Add layer nprm for all policies" This reverts commit 1306c3c64eb12613464982c66cb416a3bbc66285. * Revert "Add optional layer norm for TQC" This reverts commit 200222e3a8878007aa6032d540ae74274a4d0788. * Add experimental support to train off-policy algorithms with multiple envs * Bump version * Update version	2021-12-02 10:40:21 +01:00
Antonin RAFFIN	cd0a5e516f	Update citation (#54 ) * Update citation * Fixes for new SB3 version * Fix type hint * Additional fixes	2021-12-01 19:09:32 +01:00
Antonin RAFFIN	b1397bbb72	Release 1.3.0 (#48 )	2021-10-23 17:21:22 +02:00
Geoff McDonald	d6c5cea644	MaskablePPO dictionary observation support (#47 ) * Add dictionary observation support for ppo_mask. * Improving naming consistency. * Update changelog. * Reformat and add test * Update doc * Update README and setup Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-10-23 17:05:37 +02:00
Antonin RAFFIN	91f9b1ed34	Remove sde net arch (#44 )	2021-09-28 21:59:59 +02:00
Antonin Raffin	c525c5107b	Upgrade min sphinx version	2021-09-23 15:26:37 +02:00
kronion	ab24f8039f	PPO variant with invalid action masking (#25 ) * Add wrappers * Add maskable distributions * Add mypy configuration * Add maskable base datastructures * Add ppo_mask package * Fix circular dependency and remove test code that slipped in * Automatically mask vecenv if env is masked * Fix debugging change that slipped in * Workaround for subclassing RolloutBufferSamples * Duplicate lots of policy code in order to swap out the distributions used * Fix pytype error * Maintain py 3.6 compatibility * Fix isort lint errors * Use pyproject.toml to configure black line length * Blacken * Remove mypy.ini * Fully replace RolloutBufferSamples * Drop support for continuous distributions, remove SDE-related code * Eliminate MaskableAlgorithm and MaskableOnPolicyAlgorithm * Fix formatting * Override superclass methods as needed, fix circular import, improve naming * Fix codestyle * Eliminate VecActionMasker, replace with utils * Fix codestyle * Support masking for MultiDiscrete action spaces * Fix codestyle * Don't require the env to provide the mask already flattened * Consistent naming, prefer 'Maskable' to 'Masked' * Register policy * Link to abstract instead of pdf * Allow distribution masking to be unapplied + improved comments and docstrings * Don't use deprecated implicit optional typing * Check codestyle * Add docstring and remove misplaced TODO * Simplify env masking API, error if API unmet. Make use_masking a learn() kwarg * Fix codestyle * Update various internals to be consistent with latest SB3 * Simplify MaskableRolloutBuffer reset * Add docstring and type annotations * Ensure old probs aren't cached * Fix for new logger * Add test + fixes * Start doc * Fix type annotation * Remove abstract class + add test * Fix evaluation (add support for multi envs) * Handle merge conflicts in documentation * Bugfix: mask updates should apply to original logits, not the last masked output * Add test of distribution masking behavior * Reformat * Add MultiBinary support, remove unneeded distribution type checks * Remove unused import * Fix when using multiple envs * Remove addressed TODO * Upgrade for SB3 1.2.0 * Update docs with results + how to replicate * Add action masker tests, move wrapper tests * Move distributions, add more distribution tests * Add MaskablePPO tests, simplify and rename discrete test env * Address TODO * Add tests for MaskableMultiCategoricalDistribution, fix distributions * Add maskable identity envs for all supported action spaces, add tests, fix bug * Formatting fixes * Update doc env * Dict support not ready * Cleanup Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-09-23 14:50:10 +02:00
Scott Brownlie	b2e7126840	Train/Eval Mode Support (#39 ) * switch models between train and eval mode * update changelog * update release in change log * Update dependency Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-09-08 12:54:50 +02:00
Antonin RAFFIN	36eca8ee79	Fix type annotation + add python 3.9 + citation (#37 )	2021-07-29 18:14:03 +02:00
Antonin RAFFIN	ae39e00c44	Release v1.1.0 (#34 )	2021-07-02 11:38:46 +02:00
Long M. Lưu (刘明龙)	fab19bdb18	Update small QR-DQN docs typo (#33 ) * Update qrdqn.rst * Update changelog.rst * Update changelog.rst Add my name * Update changelog.rst Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-06-23 14:34:22 +02:00
Antonin RAFFIN	2258c72215	Update to new logger (#32 )	2021-06-14 17:25:08 +02:00
Antonin RAFFIN	08418a3cc8	Bump SB3 version (#30 )	2021-05-12 11:46:16 +02:00
Antonin RAFFIN	3665695d1e	Dictionary Observations (#29 ) * Add TQC support for new HER version * Add dict obs support * Add support for dict obs	2021-05-11 13:24:31 +02:00
Antonin RAFFIN	61bfdbc00a	Fix unused code (#28 ) * Fix unused code * Update changelog * Update SB3 dependency	2021-05-05 11:42:10 +02:00
Antonin RAFFIN	81ef23d270	SB3 v1.0 (#23 )	2021-03-17 14:32:58 +01:00
Antonin RAFFIN	9824daca44	Bug fix for QR-DQN (#21 ) * Bug fix for QR-DQN * Upgrade SB3	2021-03-06 14:54:43 +01:00
Antonin RAFFIN	7c2eb833c0	Upgrade SB3 (#20 )	2021-02-27 19:59:21 +01:00
Antonin RAFFIN	74e60381a6	Upgrade Stable-Baselines3 (#19 ) * Upgrade Stable-Baselines3 * Fix policy saving/loading	2021-02-27 18:17:22 +01:00
Toshiki Watanabe	4b4d487fdb	Fix the target calculation of QR-DQN (#18 ) * Fix the target calculation of QR-DQN * Update doc * Update version * Update changelog * Update README Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2021-01-11 14:11:16 +01:00
Antonin RAFFIN	ab2880c670	Version bump	2020-12-21 11:19:31 +01:00
Toshiki Watanabe	b30397fff5	Add QR-DQN (#13 ) * Add QR-DQN(WIP) * Update docstring * Add quantile_huber_loss * Fix typo * Remove unnecessary lines * Update variable names and comments in quantile_huber_loss * Fix mutable arguments * Update variable names * Ignore import not used warnings * Fix default parameter of optimizer in QR-DQN * Update quantile_huber_loss to have more reasonable interface * update tests * Add assertion to quantile_huber_loss * Update variable names of quantile regression * Update comments * Reduce the number of quantiles during test * Update comment * Update quantile_huber_loss * Fix isort * Add document of QR-DQN without results * Update docs * Fix bugs * Update doc * Add comments about shape * Minor edits * Update comments * Add benchmark * Doc fixes * Update doc * Bug fix in saving/loading + update tests Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2020-12-21 11:17:48 +01:00
Antonin RAFFIN	3598ca284a	Update requirements (#15 )	2020-12-13 17:29:15 +01:00
Antonin RAFFIN	857a087a2a	Update TQC to match SB3 (#14 )	2020-12-08 15:35:50 +01:00
Antonin RAFFIN	6bafcf6e88	Add TimeFeatureWrapper (#7 ) * Add TimeFeatureWrapper * Update README * Address comments	2020-11-13 13:00:56 +02:00
Antonin RAFFIN	aac20bd1e6	Release v0.10.0	2020-10-28 15:08:07 +01:00
Antonin RAFFIN	2ce8d278cc	Fix features extractor issue (#5 ) * Fix feature extractor issue * Sync with SB3 PR	2020-10-27 14:30:35 +01:00
Antonin RAFFIN	b896b7492e	Update dependencies	2020-10-22 16:35:28 +02:00
Antonin RAFFIN	e8093965c7	Fix doc build	2020-10-22 14:46:05 +02:00
Antonin RAFFIN	0700c3eeb0	Add TQC (#4 ) * Add TQC doc * Polish code * Update doc * Update results * Update doc * Update doc * Add note about PyBullet envs	2020-10-22 13:43:46 +02:00
Antonin RAFFIN	926e488196	Update wording and links	2020-10-17 17:04:00 +02:00
Anssi "Miffyli" Kanervisto	79fcf54e1e	Review docs and update changelog	2020-10-15 02:17:36 +03:00
Antonin RAFFIN	5033b192cb	Add base doc	2020-10-12 20:21:52 +02:00

33 Commits