stable-baselines3-contrib-sacd

Commit Graph

Author	SHA1	Message	Date
Antonin RAFFIN	86fb056fda	Update doc: switch from Gym to Gymnasium (#182 ) * Update documentation to switch to Gymnasium * Fix custom env type hints * Fix type check	2023-05-10 11:40:40 +02:00
Antonin RAFFIN	21cc96cafd	Add Gymnasium support (#152 ) * Add support for Gym 0.24 * Fixes for gym 0.24 * Fix for new reset signature * Add tmp SB3 branch * Fixes for gym 0.26 * Remove unused import * Fix dependency * Type annotations fixes * Reformat * Reformat with black 23 * Move to gymnasium * Patch env if needed * Fix types * Fix CI * Fixes for gymnasium * Fix wrapper annotations * Update version * Fix type check * Update QRDQN type hints and bug fix with multi envs * Fix TQC type hints * Fix TRPO type hints * Additional fixes * Update SB3 version * Update issue templates and CI --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2023-04-14 13:52:07 +02:00
Adam Gleave	901a648507	Upgrade Gym to 0.21 (#59 ) * Pendulum-v0 -> Pendulum-v1 * Reformat with black * Update changelog * Fix dtype bug in TimeFeatureWrapper * Update version and removed forward calls * Update CI * Fix min version Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2022-02-22 16:25:43 +01:00
Cyprien	59be198da0	Add Trust Region Policy Optimization (TRPO) (#40 ) * Feat: adding TRPO algorithm (WIP) WIP - Trust Region Policy Algorithm Currently the Hessian vector product is not working (see inline comments for more detail) * Feat: adding TRPO algorithm (WIP) Adding no_grad block for the line search Additional assert in the conjugate solver to help debugging * Feat: adding TRPO algorithm (WIP) - Adding ActorCriticPolicy.get_distribution - Using the Distribution object to compute the KL divergence - Checking for objective improvement in the line search - Moving magic numbers to instance variables * Feat: adding TRPO algorithm (WIP) Improving numerical stability of the conjugate gradient algorithm Critic updates * Feat: adding TRPO algorithm (WIP) Changes around the alpha of the line search Adding TRPO to __init__ files * feat: TRPO - addressing PR comments - renaming cg_solver to conjugate_gradient_solver and renaming parameter Avp_fun to matrix_vector_dot_func + docstring - extra comments + better variable names in trpo.py - defining a method for the hessian vector product instead of an inline function - fix registering correct policies for TRPO and using correct policy base in constructor * refactor: TRPO - policier - refactoring sb3_contrib.common.policies to reuse as much code as possible from sb3 * feat: using updated ActorCriticPolicy from SB3 - get_distribution will be added directly to the SB3 version of ActorCriticPolicy, this commit reflects this * Bump version for `get_distribution` support * Add basic test * Reformat * [ci skip] Fix changelog * fix: setting train mode for trpo * fix: batch_size type hint in trpo.py * style: renaming variables + docstring in trpo.py * Rename + cleanup * Move grad computation to separate method * Remove grad norm clipping * Remove n epochs and add sub-sampling * Update defaults * Add Doc * Add more test and fixes for CNN * Update doc + add benchmark * Add tests + update doc * Fix doc * Improve names for conjugate gradient * Update comments * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>	2021-12-29 11:58:03 +01:00

4 Commits