stable-baselines3-contrib-sacd

Commit Graph

Author	SHA1	Message	Date
Antonin RAFFIN	4d7ed004af	Sync SB3 Contrib with SB3 (#213 ) * Update RTD config * Switch to ruff for sorting imports * Evaluate falsy to truthy with not rather than `is False` * Add `features_extractor` argument to maskable policy * Add set_options for AsyncEval * Doc fixes	2024-05-06 14:20:28 +01:00
Antonin RAFFIN	de92025bb2	Prepare Release v2.0 (#192 )	2023-06-23 13:10:17 +02:00
Antonin RAFFIN	21cc96cafd	Add Gymnasium support (#152 ) * Add support for Gym 0.24 * Fixes for gym 0.24 * Fix for new reset signature * Add tmp SB3 branch * Fixes for gym 0.26 * Remove unused import * Fix dependency * Type annotations fixes * Reformat * Reformat with black 23 * Move to gymnasium * Patch env if needed * Fix types * Fix CI * Fixes for gymnasium * Fix wrapper annotations * Update version * Fix type check * Update QRDQN type hints and bug fix with multi envs * Fix TQC type hints * Fix TRPO type hints * Additional fixes * Update SB3 version * Update issue templates and CI --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-04-14 13:52:07 +02:00
Jonas Reiher	aacded79c5	Add stats window argument (#171 ) * added missing tensorboard_log docstring * added stats_window_size argument to all models * changelog updated * Update SB3 version * fixed passing stats_window_size to parent * added test of stats_window_size --------- Co-authored-by: Antonin Raffin <antonin.raffin@dlr.de>	2023-04-05 18:47:27 +02:00
Alex Pasquali	376d9551de	Update MaskablePPO docs (#150 ) * MaskablePPO docs Added a warning about possible crashes caused by chack_env in case of invalid actions. * Reformat with black 23 * Rephrase note on action sampling * Fix action noise * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-02-13 14:31:49 +01:00
Quentin Gallouédec	7c4a249fa4	Standardize the use of ``from gym import spaces`` (#131 ) * Standardize from gym import spaces * update changelog * update issue template * update version * Update version	2023-01-02 15:35:00 +01:00
Quentin Gallouédec	9cf8b5076f	Construct tensors directly on GPUs (#128 ) * `to(device)` to `device=device` and `float()` to `dtype=th.float32` * Update changelog * Fix type checking Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-12-23 00:44:25 +01:00
Quentin Gallouédec	36aeae18b5	Fix `Self` return type (#116 ) * Self hint for distributions * ClassSelf to SelfClass	2022-11-22 13:12:35 +01:00
Antonin RAFFIN	a9735b9f31	Fix reshape LSTM states (#112 ) * Fix LSTM states reshape * Fix warnings and update changelog * Remove unused variable * Fix runtime error when using n_lstm_layers > 1	2022-10-26 18:03:45 +02:00
Antonin RAFFIN	c75ad7dd58	Remove deprecated features (#108 ) * Remove deprecated features * Upgrade SB3 * Fix tests	2022-10-11 13:04:18 +02:00
Antonin RAFFIN	52795a307e	Add progress bar argument (#107 ) * Add progress bar argument * Sort imports	2022-10-10 18:44:13 +02:00
Quentin Gallouédec	e9c97948c8	Fixed the return type of ``.load()`` methods (#106 ) * Fix return type for learn using TypeVar * Update changelog Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-10-10 17:21:38 +02:00
Quentin Gallouédec	dec7b5303a	Deprecate ``create_eval_env``, ``eval_env`` and ``eval_freq`` parameter (#105 ) * Deprecate ``eval_env``, ``eval_freq```and ``create_eval_env`` * Update changelog * Typo * Raise deprecation warining in _setup_learn * Upgrade to latest SB3 version and update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-10-10 17:12:40 +02:00
Max Lodel	fc68af8841	Fixed shared_lstm argument in CNN and MultiInput Policies for RecurrentPPO (#90 ) * fixed shared_lstm parameter in CNN and MultiInput Policies * updated tests * changelog * Fix FPS for recurrent PPO * Fix import * Update changelog Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-07-26 00:27:17 +02:00
rnederstigt	bfa86ce4fe	Fix masked quantities in RecurrentPPO (#78 ) * Ignore masked indexes when calculating the loss functions	2022-06-13 16:00:40 +02:00
Antonin RAFFIN	75b2de1399	Recurrent PPO (#53 ) * Running (not working yet) version of recurrent PPO * Fixes for multi envs * Save WIP, rework the sampling * Add Box support * Fix sample order * Being cleanup, code is broken (again) * First working version (no shared lstm) * Start cleanup * Try rnn with value function * Re-enable batch size * Deactivate vf rnn * Allow any batch size * Add support for evaluation * Add CNN support * Fix start of sequence * Allow shared LSTM * Rename mask to episode_start * Fix type hint * Enable LSTM for critic * Clean code * Fix for CNN LSTM * Fix sampling with n_layers > 1 * Add std logger * Update wording * Rename and add dict obs support * Fixes for dict obs support * Do not run slow tests * Fix doc * Update recurrent PPO example * Update README * Use Pendulum-v1 for tests * Fix image env * Speedup LSTM forward pass (#63) * added more efficient lstm implementation * Rename and add comment Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Fixes * Remove OpenAI sampling and improve coverage * Sync with SB3 PPO * Pass state shape and allow lstm kwargs * Update tests * Add masking for padded sequences * Update default in perf test * Remove TODO, mask is now working * Add helper to remove duplicated code, remove hack for padding * Enable LSTM critic and raise threshold for cartpole with no vel * Fix tests * Update doc and tests * Doc fix * Fix for new Sphinx version * Fix doc note * Switch to batch first, no more additional swap * Add comments and mask entropy loss Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com>	2022-05-30 04:31:12 +02:00

16 Commits