stable-baselines3-contrib-sacd

History

Sean Gillen 675304d8fa Augmented Random Search (ARS) (#42 ) * first pass at ars, replicates initial results, still needs more testing, cleanup * add a few docs and tests, bugfixes for ARS * debug and comment * break out dump logs * rollback so there are now predict workers, some refactoring * remove callback from self, remove torch multiprocessing * add module docs * run formatter * fix load and rerun formatter * rename to less mathy variable names, rename _validate_hypers * refactor to use evaluatate_policy, linear policy no longer uses bias or squashing * move everything to torch, add support for discrete action spaces, bugfix for alive reward offset * added tests, passing all of them, add support for discrete action spaces * update documentation * allow for reward offset when there are multiple envs * update results again * Reformat * Ignore unused imports * Renaming + Cleanup * Experimental multiprocessing * Cleaner multiprocessing * Reformat * Fixes for callback * Fix combining stats * 2nd way * Make the implementation cpu only * Fixes + POC with mp module * POC Processes * Cleaner aync implementation * Remove unused arg * Add typing * Revert vec normalize offset hack * Add `squash_output` parameter * Add more tests * Add comments * Update doc * Add comments * Add more logging * Fix TRPO issue on GPU * Tmp fix for ARS tests on GPU * Additional tmp fixes for ARS * update docstrings + formatting, fix bad exceptioe string in ARSPolicy * Add comments and docstrings * Fix missing import * Fix type check * Add dosctrings * GPU support, first attempt * Fix test * Add missing docstring * Typos * Update defaults hyperparameters Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>		2022-01-18 13:57:27 +01:00
..
wrappers	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
test_cnn.py	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00
test_deterministic.py	Augmented Random Search (ARS) (#42 )	2022-01-18 13:57:27 +01:00
test_dict_env.py	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00
test_distributions.py	Update Maskable PPO to match SB3 PPO + improve coverage (#56 )	2021-12-10 12:48:19 +01:00
test_invalid_actions.py	Update Maskable PPO to match SB3 PPO + improve coverage (#56 )	2021-12-10 12:48:19 +01:00
test_run.py	Augmented Random Search (ARS) (#42 )	2022-01-18 13:57:27 +01:00
test_save_load.py	Augmented Random Search (ARS) (#42 )	2022-01-18 13:57:27 +01:00
test_train_eval_mode.py	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
test_utils.py	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00