stable-baselines3-contrib-sacd

Commit Graph

Author	SHA1	Message	Date
Alex Pasquali	376d9551de	Update MaskablePPO docs (#150 ) * MaskablePPO docs Added a warning about possible crashes caused by chack_env in case of invalid actions. * Reformat with black 23 * Rephrase note on action sampling * Fix action noise * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-02-13 14:31:49 +01:00
Alex Pasquali	6bc8e426bf	Removed shared layers in mlp_extractor (#137 ) * Removed shared layers in mlp_extractor * Add ruff * Update version and add warning Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-25 16:28:27 +01:00
Antonin Raffin	1d0edd2dab	Move pytype to pyproject.toml	2023-01-10 22:55:12 +01:00
Antonin RAFFIN	7bf9cf3f3a	Release v1.7.0 (#134 )	2023-01-10 22:35:18 +01:00
Alex Pasquali	b5aa9a47ce	Deprecation of shared layers in `mlp_extractor` (#133 ) * Deprecation of shared layers in mlp_extractor * Fix missing import * Reformat and update tests Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-01-05 10:42:22 +01:00
Quentin Gallouédec	7c4a249fa4	Standardize the use of ``from gym import spaces`` (#131 ) * Standardize from gym import spaces * update changelog * update issue template * update version * Update version	2023-01-02 15:35:00 +01:00
Quentin Gallouédec	c9bd045d5c	Add support for python3.10 (#129 ) Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-12-23 00:54:35 +01:00
Quentin Gallouédec	9cf8b5076f	Construct tensors directly on GPUs (#128 ) * `to(device)` to `device=device` and `float()` to `dtype=th.float32` * Update changelog * Fix type checking Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-12-23 00:44:25 +01:00
Alex Pasquali	ab8684f469	[Feature] Non-shared features extractor in on-policy algorithms (#130 ) * Modified sb3_contrib/common/maskable/policies.py - Added support for non-shared features extractor in file sb3_contrib/common/maskable/policies.py - updated changelog * Modified sb3_contrib/common/recurrent/policies.py * Modified sb3_contrib/qrdqn/policies.py and sb3_contrib/tqc/policies.py * Updated test_cnn.py * Upgrade SB3 version * Revert changes in formatting * Remove duplicate normalize_images * Add test for image-like inputs * Fixes and add more tests * Update SB3 version * Fix ARS warnings Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-12-23 00:23:45 +01:00
Quentin Gallouédec	6b23c6cfe3	Add `with_bias` parameter to `ARSPolicy` and fix `sb3_contrib/ars/policies.py` type hint (#122 ) * Update contribution.md * New loop struct to make mypy happy * Update setup.cfg * Update changelog * fix squash_output = False in ARS policy * Add with_bias parameter to ARSPolicy * Make ARSLinearPolicy a special case of ARSPolicy * Remove ars_policy from mypy exclude * Update changelog * Update SB3 version * Fix to save ARS linear policy saved with sb3-contrib < 1.7.0 * Fix test * Turn docstring into comment Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2022-12-12 13:22:09 +01:00
Quentin Gallouédec	9a728513da	Upgrade CI/github-actions (#125 ) * Update ci.yml * Update changelog.rst	2022-12-09 12:30:22 +01:00
Zikang Xiong	ddb3a1355e	Expose modules in `__init__.py` with `__all__` attribute (#124 ) * expose modules in __init__.py with __all__ attribute * Update version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-12-05 15:53:57 +01:00
Quentin Gallouédec	b3e4ddd09a	Fix `sb3_contrib/common/recurrent/type_aliases.py` type hint (#121 ) * Update setup.cfg * Update changelog * Update type aliases	2022-11-29 10:41:07 +01:00
Quentin Gallouédec	ded9f65bfd	Fix `sb3_contrib/common/utils.py` type hint (#120 ) * Update contribution.md * New loop struct to make mypy happy * Update setup.cfg * Update changelog * Add return statement in cg solver for max_iter=0 Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-11-29 10:24:44 +01:00
Quentin Gallouédec	3d28d1e5de	Mypy type checking (#119 ) * Update Makefile * Update changelog * gitignore mypy cache * mypy config * Add color to mypy output Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-11-28 23:00:31 +01:00
Antonin Raffin	703fd2dd68	Fix for new flake8 version	2022-11-25 18:52:15 +01:00
Quentin Gallouédec	36aeae18b5	Fix `Self` return type (#116 ) * Self hint for distributions * ClassSelf to SelfClass	2022-11-22 13:12:35 +01:00
Antonin RAFFIN	a9735b9f31	Fix reshape LSTM states (#112 ) * Fix LSTM states reshape * Fix warnings and update changelog * Remove unused variable * Fix runtime error when using n_lstm_layers > 1	2022-10-26 18:03:45 +02:00
Antonin RAFFIN	c75ad7dd58	Remove deprecated features (#108 ) * Remove deprecated features * Upgrade SB3 * Fix tests	2022-10-11 13:04:18 +02:00
Antonin RAFFIN	52795a307e	Add progress bar argument (#107 ) * Add progress bar argument * Sort imports	2022-10-10 18:44:13 +02:00
Quentin Gallouédec	e9c97948c8	Fixed the return type of ``.load()`` methods (#106 ) * Fix return type for learn using TypeVar * Update changelog Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-10-10 17:21:38 +02:00
Quentin Gallouédec	dec7b5303a	Deprecate ``create_eval_env``, ``eval_env`` and ``eval_freq`` parameter (#105 ) * Deprecate ``eval_env``, ``eval_freq```and ``create_eval_env`` * Update changelog * Typo * Raise deprecation warining in _setup_learn * Upgrade to latest SB3 version and update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-10-10 17:12:40 +02:00
Antonin RAFFIN	2490468b11	Release v1.6.1 (#104 )	2022-09-29 12:30:12 +02:00
Honglu Fan	cad9034fdb	Handle batch norm in target update (#99 ) * Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996 * Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996 * Copy running stats regardless of tau in QRDQN and TQC. See https://github.com/DLR-RM/stable-baselines3/issues/996 * roll back test_cnn.py	2022-08-27 12:31:00 +02:00
Quentin Gallouédec	7993b75781	Support `device="auto"`for buffers and set it as default value (#98 ) * Default device for buffer is auto * `device=auto` in ARS * Undo ARS change * Update changelog * Update min SB3 version Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-08-24 09:48:18 +02:00
Burak Demirbilek	049f5a16e9	Fixed missing verbose parameter passing (#97 )	2022-08-16 15:54:46 +02:00
CppMaster	eb48fec638	Maskable eval callback call callback fix (#93 ) * call correctly both self.callback_on_new_best and self.callback - similar as in EvalCallback * MaskableEvalCallback - updated sync_envs_normalization handling * MaskableEvalCallback - updated sync_envs_normalization handling - test MaskablePPO - register policies (tests fails otherwise) * MaskableEvalCallback - updated docstring * updated changelog.rst * changes for stable-baselines3==1.6.0 * version range * suggested changes * Reformat and update version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-07-27 19:52:07 +02:00
Max Lodel	fc68af8841	Fixed shared_lstm argument in CNN and MultiInput Policies for RecurrentPPO (#90 ) * fixed shared_lstm parameter in CNN and MultiInput Policies * updated tests * changelog * Fix FPS for recurrent PPO * Fix import * Update changelog Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-07-26 00:27:17 +02:00
Adam Gleave	7e687ac47c	Use higher resolution time_ns() and avoid division by zero (#91 ) * Use higher resolution time_ns and add max to avoid division by zero * Add missing imports * Update changelog	2022-07-25 23:12:20 +02:00
Quentin Gallouédec	3cbd2429be	Fix returned type in predict (#88 ) * actions[0] -> actions.squeeze(0) * Update changelog * Update changelog * Update version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-07-18 11:49:03 +02:00
Antonin Raffin	c9d621b816	Use ICRL url for PPO blog post	2022-07-12 23:49:26 +02:00
Antonin Raffin	5ec9e01b44	Update changelog	2022-07-12 23:15:14 +02:00
Antonin RAFFIN	087951d34b	Release v1.6.0 and bug fix for TRPO (#84 )	2022-07-12 23:12:24 +02:00
Antonin RAFFIN	db4c0114d0	Update default TQC net arch when using NatureCnn (#79 ) * Update default TQC net arch when using NatureCnn * Bump version	2022-06-18 10:53:29 +02:00
rnederstigt	bfa86ce4fe	Fix masked quantities in RecurrentPPO (#78 ) * Ignore masked indexes when calculating the loss functions	2022-06-13 16:00:40 +02:00
Antonin RAFFIN	75b2de1399	Recurrent PPO (#53 ) * Running (not working yet) version of recurrent PPO * Fixes for multi envs * Save WIP, rework the sampling * Add Box support * Fix sample order * Being cleanup, code is broken (again) * First working version (no shared lstm) * Start cleanup * Try rnn with value function * Re-enable batch size * Deactivate vf rnn * Allow any batch size * Add support for evaluation * Add CNN support * Fix start of sequence * Allow shared LSTM * Rename mask to episode_start * Fix type hint * Enable LSTM for critic * Clean code * Fix for CNN LSTM * Fix sampling with n_layers > 1 * Add std logger * Update wording * Rename and add dict obs support * Fixes for dict obs support * Do not run slow tests * Fix doc * Update recurrent PPO example * Update README * Use Pendulum-v1 for tests * Fix image env * Speedup LSTM forward pass (#63) * added more efficient lstm implementation * Rename and add comment Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Fixes * Remove OpenAI sampling and improve coverage * Sync with SB3 PPO * Pass state shape and allow lstm kwargs * Update tests * Add masking for padded sequences * Update default in perf test * Remove TODO, mask is now working * Add helper to remove duplicated code, remove hack for padding * Enable LSTM critic and raise threshold for cartpole with no vel * Fix tests * Update doc and tests * Doc fix * Fix for new Sphinx version * Fix doc note * Switch to batch first, no more additional swap * Add comments and mask entropy loss Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com>	2022-05-30 04:31:12 +02:00
Antonin RAFFIN	cd592a111f	Upgrade min SB3 version (#70 ) * Upgrade min SB3 version * Fix for newer sphinx version	2022-05-29 21:54:23 +02:00
Antonin RAFFIN	bec00386d1	Upgrade to python 3.7+ syntax (#69 ) * Upgrade to python 3.7+ syntax * Switch to PyTorch 1.11	2022-04-25 13:02:07 +02:00
Antonin RAFFIN	812648e6cd	Rename QRDQN logger key (#67 )	2022-04-12 12:50:35 +02:00
Grégoire Passault	99853265a9	Using policy_aliases instead of register_policy (#66 ) * Using policy_aliases instead of register_policy * Moving policy_aliases definitions * Update SB3 version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-04-08 21:36:23 +02:00
Antonin RAFFIN	9d7e33d213	Release v1.5.0 (#64 )	2022-03-25 15:04:53 +01:00
Costa Huang	f5c1aaa194	Allow PPO to turn off advantage normalization (#61 ) * Allow PPO to turn off advantage normalization * Quick fix * Add test cases * Update docs * Quick fix * Quick fix * Fix sort	2022-02-23 10:11:16 +01:00
Adam Gleave	901a648507	Upgrade Gym to 0.21 (#59 ) * Pendulum-v0 -> Pendulum-v1 * Reformat with black * Update changelog * Fix dtype bug in TimeFeatureWrapper * Update version and removed forward calls * Update CI * Fix min version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-02-22 16:25:43 +01:00
Antonin Raffin	a78891bd00	Update release date	2022-01-19 13:52:30 +01:00
Antonin RAFFIN	89f2bae9f6	Release 1.4.0 (#57 ) * Release 1.4.0 * Update requirements	2022-01-19 13:50:56 +01:00
Sean Gillen	675304d8fa	Augmented Random Search (ARS) (#42 ) * first pass at ars, replicates initial results, still needs more testing, cleanup * add a few docs and tests, bugfixes for ARS * debug and comment * break out dump logs * rollback so there are now predict workers, some refactoring * remove callback from self, remove torch multiprocessing * add module docs * run formatter * fix load and rerun formatter * rename to less mathy variable names, rename _validate_hypers * refactor to use evaluatate_policy, linear policy no longer uses bias or squashing * move everything to torch, add support for discrete action spaces, bugfix for alive reward offset * added tests, passing all of them, add support for discrete action spaces * update documentation * allow for reward offset when there are multiple envs * update results again * Reformat * Ignore unused imports * Renaming + Cleanup * Experimental multiprocessing * Cleaner multiprocessing * Reformat * Fixes for callback * Fix combining stats * 2nd way * Make the implementation cpu only * Fixes + POC with mp module * POC Processes * Cleaner aync implementation * Remove unused arg * Add typing * Revert vec normalize offset hack * Add `squash_output` parameter * Add more tests * Add comments * Update doc * Add comments * Add more logging * Fix TRPO issue on GPU * Tmp fix for ARS tests on GPU * Additional tmp fixes for ARS * update docstrings + formatting, fix bad exceptioe string in ARSPolicy * Add comments and docstrings * Fix missing import * Fix type check * Add dosctrings * GPU support, first attempt * Fix test * Add missing docstring * Typos * Update defaults hyperparameters Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-01-18 13:57:27 +01:00
Antonin Raffin	3b007ae93b	Fix TRPO doc	2021-12-29 15:03:51 +01:00
Cyprien	59be198da0	Add Trust Region Policy Optimization (TRPO) (#40 ) * Feat: adding TRPO algorithm (WIP) WIP - Trust Region Policy Algorithm Currently the Hessian vector product is not working (see inline comments for more detail) * Feat: adding TRPO algorithm (WIP) Adding no_grad block for the line search Additional assert in the conjugate solver to help debugging * Feat: adding TRPO algorithm (WIP) - Adding ActorCriticPolicy.get_distribution - Using the Distribution object to compute the KL divergence - Checking for objective improvement in the line search - Moving magic numbers to instance variables * Feat: adding TRPO algorithm (WIP) Improving numerical stability of the conjugate gradient algorithm Critic updates * Feat: adding TRPO algorithm (WIP) Changes around the alpha of the line search Adding TRPO to __init__ files * feat: TRPO - addressing PR comments - renaming cg_solver to conjugate_gradient_solver and renaming parameter Avp_fun to matrix_vector_dot_func + docstring - extra comments + better variable names in trpo.py - defining a method for the hessian vector product instead of an inline function - fix registering correct policies for TRPO and using correct policy base in constructor * refactor: TRPO - policier - refactoring sb3_contrib.common.policies to reuse as much code as possible from sb3 * feat: using updated ActorCriticPolicy from SB3 - get_distribution will be added directly to the SB3 version of ActorCriticPolicy, this commit reflects this * Bump version for `get_distribution` support * Add basic test * Reformat * [ci skip] Fix changelog * fix: setting train mode for trpo * fix: batch_size type hint in trpo.py * style: renaming variables + docstring in trpo.py * Rename + cleanup * Move grad computation to separate method * Remove grad norm clipping * Remove n epochs and add sub-sampling * Update defaults * Add Doc * Add more test and fixes for CNN * Update doc + add benchmark * Add tests + update doc * Fix doc * Improve names for conjugate gradient * Update comments * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-12-29 11:58:03 +01:00
Antonin RAFFIN	b44689b0ea	Update Maskable PPO to match SB3 PPO + improve coverage (#56 )	2021-12-10 12:48:19 +01:00
Antonin Raffin	20b5351086	Add color in the tests	2021-12-10 12:38:40 +01:00

1 2 3

112 Commits All Branches Search

112 Commits

All Branches