stable-baselines3-contrib-sacd

Commit Graph

Author	SHA1	Message	Date
Alex Pasquali	376d9551de	Update MaskablePPO docs (#150 ) * MaskablePPO docs Added a warning about possible crashes caused by chack_env in case of invalid actions. * Reformat with black 23 * Rephrase note on action sampling * Fix action noise * Update changelog --------- Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2023-02-13 14:31:49 +01:00
Antonin RAFFIN	75b2de1399	Recurrent PPO (#53 ) * Running (not working yet) version of recurrent PPO * Fixes for multi envs * Save WIP, rework the sampling * Add Box support * Fix sample order * Being cleanup, code is broken (again) * First working version (no shared lstm) * Start cleanup * Try rnn with value function * Re-enable batch size * Deactivate vf rnn * Allow any batch size * Add support for evaluation * Add CNN support * Fix start of sequence * Allow shared LSTM * Rename mask to episode_start * Fix type hint * Enable LSTM for critic * Clean code * Fix for CNN LSTM * Fix sampling with n_layers > 1 * Add std logger * Update wording * Rename and add dict obs support * Fixes for dict obs support * Do not run slow tests * Fix doc * Update recurrent PPO example * Update README * Use Pendulum-v1 for tests * Fix image env * Speedup LSTM forward pass (#63) * added more efficient lstm implementation * Rename and add comment Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Fixes * Remove OpenAI sampling and improve coverage * Sync with SB3 PPO * Pass state shape and allow lstm kwargs * Update tests * Add masking for padded sequences * Update default in perf test * Remove TODO, mask is now working * Add helper to remove duplicated code, remove hack for padding * Enable LSTM critic and raise threshold for cartpole with no vel * Fix tests * Update doc and tests * Doc fix * Fix for new Sphinx version * Fix doc note * Switch to batch first, no more additional swap * Add comments and mask entropy loss Co-authored-by: Neville Walo <43504521+Walon1998@users.noreply.github.com>	2022-05-30 04:31:12 +02:00
Adam Gleave	901a648507	Upgrade Gym to 0.21 (#59 ) * Pendulum-v0 -> Pendulum-v1 * Reformat with black * Update changelog * Fix dtype bug in TimeFeatureWrapper * Update version and removed forward calls * Update CI * Fix min version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-02-22 16:25:43 +01:00
Sean Gillen	675304d8fa	Augmented Random Search (ARS) (#42 ) * first pass at ars, replicates initial results, still needs more testing, cleanup * add a few docs and tests, bugfixes for ARS * debug and comment * break out dump logs * rollback so there are now predict workers, some refactoring * remove callback from self, remove torch multiprocessing * add module docs * run formatter * fix load and rerun formatter * rename to less mathy variable names, rename _validate_hypers * refactor to use evaluatate_policy, linear policy no longer uses bias or squashing * move everything to torch, add support for discrete action spaces, bugfix for alive reward offset * added tests, passing all of them, add support for discrete action spaces * update documentation * allow for reward offset when there are multiple envs * update results again * Reformat * Ignore unused imports * Renaming + Cleanup * Experimental multiprocessing * Cleaner multiprocessing * Reformat * Fixes for callback * Fix combining stats * 2nd way * Make the implementation cpu only * Fixes + POC with mp module * POC Processes * Cleaner aync implementation * Remove unused arg * Add typing * Revert vec normalize offset hack * Add `squash_output` parameter * Add more tests * Add comments * Update doc * Add comments * Add more logging * Fix TRPO issue on GPU * Tmp fix for ARS tests on GPU * Additional tmp fixes for ARS * update docstrings + formatting, fix bad exceptioe string in ARSPolicy * Add comments and docstrings * Fix missing import * Fix type check * Add dosctrings * GPU support, first attempt * Fix test * Add missing docstring * Typos * Update defaults hyperparameters Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>	2022-01-18 13:57:27 +01:00

4 Commits