Commit Graph

30 Commits

Author SHA1 Message Date
Antonin RAFFIN cf4ed5fe65 Release v2.1.0 (#204)
* Release v2.1.0

* Fix mypy

* Fix warnings in tests
2024-05-06 14:20:28 +01:00
Antonin RAFFIN de92025bb2
Prepare Release v2.0 (#192) 2023-06-23 13:10:17 +02:00
Antonin RAFFIN 21cc96cafd
Add Gymnasium support (#152)
* Add support for Gym 0.24

* Fixes for gym 0.24

* Fix for new reset signature

* Add tmp SB3 branch

* Fixes for gym 0.26

* Remove unused import

* Fix dependency

* Type annotations fixes

* Reformat

* Reformat with black 23

* Move to gymnasium

* Patch env if needed

* Fix types

* Fix CI

* Fixes for gymnasium

* Fix wrapper annotations

* Update version

* Fix type check

* Update QRDQN type hints and bug fix with multi envs

* Fix TQC type hints

* Fix TRPO type hints

* Additional fixes

* Update SB3 version

* Update issue templates and CI

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2023-04-14 13:52:07 +02:00
Jonas Reiher aacded79c5
Add stats window argument (#171)
* added missing tensorboard_log docstring

* added stats_window_size argument to all models

* changelog updated

* Update SB3 version

* fixed passing stats_window_size to parent

* added test of stats_window_size

---------

Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>
2023-04-05 18:47:27 +02:00
Antonin RAFFIN ce115982aa
Fix QR-DQN type hints (#170) 2023-03-30 11:50:26 +02:00
Antonin RAFFIN 728c1c5b7f
Issue forms and pyproject.toml (#162)
* Issue forms and pyproject.toml

* [ci skip] Fix typos

* Fix isort config

* Use secret link to download atari roms

* Fix for mypy and update config

* Upgrade SB3 and fix warnings

* Fix doc build

* Update Makefile

* Lint first
2023-03-11 22:57:45 +01:00
Alex Pasquali 376d9551de
Update MaskablePPO docs (#150)
* MaskablePPO docs

Added a warning about possible crashes caused by chack_env in case of invalid actions.

* Reformat with black 23

* Rephrase note on action sampling

* Fix action noise

* Update changelog

---------

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2023-02-13 14:31:49 +01:00
Quentin Gallouédec 7c4a249fa4
Standardize the use of ``from gym import spaces`` (#131)
* Standardize from gym import spaces

* update changelog

* update issue template

* update version

* Update version
2023-01-02 15:35:00 +01:00
Alex Pasquali ab8684f469
[Feature] Non-shared features extractor in on-policy algorithms (#130)
* Modified sb3_contrib/common/maskable/policies.py

- Added support for non-shared features extractor in file sb3_contrib/common/maskable/policies.py
- updated changelog

* Modified sb3_contrib/common/recurrent/policies.py

* Modified sb3_contrib/qrdqn/policies.py and sb3_contrib/tqc/policies.py

* Updated test_cnn.py

* Upgrade SB3 version

* Revert changes in formatting

* Remove duplicate normalize_images

* Add test for image-like inputs

* Fixes and add more tests

* Update SB3 version

* Fix ARS warnings

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-12-23 00:23:45 +01:00
Zikang Xiong ddb3a1355e
Expose modules in `__init__.py` with `__all__` attribute (#124)
* expose modules in __init__.py with __all__ attribute

* Update version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-12-05 15:53:57 +01:00
Quentin Gallouédec 36aeae18b5
Fix `Self` return type (#116)
* Self hint for distributions

* ClassSelf to SelfClass
2022-11-22 13:12:35 +01:00
Antonin RAFFIN c75ad7dd58
Remove deprecated features (#108)
* Remove deprecated features

* Upgrade SB3

* Fix tests
2022-10-11 13:04:18 +02:00
Antonin RAFFIN 52795a307e
Add progress bar argument (#107)
* Add progress bar argument

* Sort imports
2022-10-10 18:44:13 +02:00
Quentin Gallouédec e9c97948c8
Fixed the return type of ``.load()`` methods (#106)
* Fix return type for learn using TypeVar

* Update changelog

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2022-10-10 17:21:38 +02:00
Quentin Gallouédec dec7b5303a
Deprecate ``create_eval_env``, ``eval_env`` and ``eval_freq`` parameter (#105)
* Deprecate ``eval_env``, ``eval_freq```and ``create_eval_env``

* Update changelog

* Typo

* Raise deprecation warining in _setup_learn

* Upgrade to latest SB3 version and update changelog

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-10-10 17:12:40 +02:00
Honglu Fan cad9034fdb
Handle batch norm in target update (#99)
* Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996

* Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996

* Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996

* roll back test_cnn.py
2022-08-27 12:31:00 +02:00
Antonin RAFFIN 75b2de1399
Recurrent PPO (#53)
* Running (not working yet) version of recurrent PPO

* Fixes for multi envs

* Save WIP, rework the sampling

* Add Box support

* Fix sample order

* Being cleanup, code is broken (again)

* First working version (no shared lstm)

* Start cleanup

* Try rnn with value function

* Re-enable batch size

* Deactivate vf rnn

* Allow any batch size

* Add support for evaluation

* Add CNN support

* Fix start of sequence

* Allow shared LSTM

* Rename mask to episode_start

* Fix type hint

* Enable LSTM for critic

* Clean code

* Fix for CNN LSTM

* Fix sampling with n_layers > 1

* Add std logger

* Update wording

* Rename and add dict obs support

* Fixes for dict obs support

* Do not run slow tests

* Fix doc

* Update recurrent PPO example

* Update README

* Use Pendulum-v1 for tests

* Fix image env

* Speedup LSTM forward pass (#63)

* added more efficient lstm implementation

* Rename and add comment

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>

* Fixes

* Remove OpenAI sampling and improve coverage

* Sync with SB3 PPO

* Pass state shape and allow lstm kwargs

* Update tests

* Add masking for padded sequences

* Update default in perf test

* Remove TODO, mask is now working

* Add helper to remove duplicated code, remove hack for padding

* Enable LSTM critic and raise threshold for cartpole with no vel

* Fix tests

* Update doc and tests

* Doc fix

* Fix for new Sphinx version

* Fix doc note

* Switch to batch first, no more additional swap

* Add comments and mask entropy loss

Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com>
2022-05-30 04:31:12 +02:00
Antonin RAFFIN bec00386d1
Upgrade to python 3.7+ syntax (#69)
* Upgrade to python 3.7+ syntax

* Switch to PyTorch 1.11
2022-04-25 13:02:07 +02:00
Antonin RAFFIN 812648e6cd
Rename QRDQN logger key (#67) 2022-04-12 12:50:35 +02:00
Grégoire Passault 99853265a9
Using policy_aliases instead of register_policy (#66)
* Using policy_aliases instead of register_policy

* Moving policy_aliases definitions

* Update SB3 version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-04-08 21:36:23 +02:00
Adam Gleave 901a648507
Upgrade Gym to 0.21 (#59)
* Pendulum-v0 -> Pendulum-v1

* Reformat with black

* Update changelog

* Fix dtype bug in TimeFeatureWrapper

* Update version and removed forward calls

* Update CI

* Fix min version

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2022-02-22 16:25:43 +01:00
Antonin RAFFIN a1b5ea67ae
Multiprocessing support for off policy algorithms (#50)
* TQC support for multienv

* Add optional layer norm for TQC

* Add layer nprm for all policies

* Revert "Add layer nprm for all policies"

This reverts commit 1306c3c64eb12613464982c66cb416a3bbc66285.

* Revert "Add optional layer norm for TQC"

This reverts commit 200222e3a8878007aa6032d540ae74274a4d0788.

* Add experimental support to train off-policy algorithms with multiple envs

* Bump version

* Update version
2021-12-02 10:40:21 +01:00
Antonin RAFFIN cd0a5e516f
Update citation (#54)
* Update citation

* Fixes for new SB3 version

* Fix type hint

* Additional fixes
2021-12-01 19:09:32 +01:00
Scott Brownlie b2e7126840
Train/Eval Mode Support (#39)
* switch models between train and eval mode

* update changelog

* update release in change log

* Update dependency

Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>
2021-09-08 12:54:50 +02:00
Antonin RAFFIN 2258c72215
Update to new logger (#32) 2021-06-14 17:25:08 +02:00
Antonin RAFFIN 3665695d1e
Dictionary Observations (#29)
* Add TQC support for new HER version

* Add dict obs support

* Add support for dict obs
2021-05-11 13:24:31 +02:00
Antonin RAFFIN 9824daca44
Bug fix for QR-DQN (#21)
* Bug fix for QR-DQN

* Upgrade SB3
2021-03-06 14:54:43 +01:00
Antonin RAFFIN 74e60381a6
Upgrade Stable-Baselines3 (#19)
* Upgrade Stable-Baselines3

* Fix policy saving/loading
2021-02-27 18:17:22 +01:00
Toshiki Watanabe 4b4d487fdb
Fix the target calculation of QR-DQN (#18)
* Fix the target calculation of QR-DQN

* Update doc

* Update version

* Update changelog

* Update README

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2021-01-11 14:11:16 +01:00
Toshiki Watanabe b30397fff5
Add QR-DQN (#13)
* Add QR-DQN(WIP)

* Update docstring

* Add quantile_huber_loss

* Fix typo

* Remove unnecessary lines

* Update variable names and comments in quantile_huber_loss

* Fix mutable arguments

* Update variable names

* Ignore import not used warnings

* Fix default parameter of optimizer in QR-DQN

* Update quantile_huber_loss to have more reasonable interface

* update tests

* Add assertion to quantile_huber_loss

* Update variable names of quantile regression

* Update comments

* Reduce the number of quantiles during test

* Update comment

* Update quantile_huber_loss

* Fix isort

* Add document of QR-DQN without results

* Update docs

* Fix bugs

* Update doc

* Add comments about shape

* Minor edits

* Update comments

* Add benchmark

* Doc fixes

* Update doc

* Bug fix in saving/loading + update tests

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
2020-12-21 11:17:48 +01:00