stable-baselines3-contrib-sacd

History

Cyprien 59be198da0 Add Trust Region Policy Optimization (TRPO) (#40 ) * Feat: adding TRPO algorithm (WIP) WIP - Trust Region Policy Algorithm Currently the Hessian vector product is not working (see inline comments for more detail) * Feat: adding TRPO algorithm (WIP) Adding no_grad block for the line search Additional assert in the conjugate solver to help debugging * Feat: adding TRPO algorithm (WIP) - Adding ActorCriticPolicy.get_distribution - Using the Distribution object to compute the KL divergence - Checking for objective improvement in the line search - Moving magic numbers to instance variables * Feat: adding TRPO algorithm (WIP) Improving numerical stability of the conjugate gradient algorithm Critic updates * Feat: adding TRPO algorithm (WIP) Changes around the alpha of the line search Adding TRPO to __init__ files * feat: TRPO - addressing PR comments - renaming cg_solver to conjugate_gradient_solver and renaming parameter Avp_fun to matrix_vector_dot_func + docstring - extra comments + better variable names in trpo.py - defining a method for the hessian vector product instead of an inline function - fix registering correct policies for TRPO and using correct policy base in constructor * refactor: TRPO - policier - refactoring sb3_contrib.common.policies to reuse as much code as possible from sb3 * feat: using updated ActorCriticPolicy from SB3 - get_distribution will be added directly to the SB3 version of ActorCriticPolicy, this commit reflects this * Bump version for `get_distribution` support * Add basic test * Reformat * [ci skip] Fix changelog * fix: setting train mode for trpo * fix: batch_size type hint in trpo.py * style: renaming variables + docstring in trpo.py * Rename + cleanup * Move grad computation to separate method * Remove grad norm clipping * Remove n epochs and add sub-sampling * Update defaults * Add Doc * Add more test and fixes for CNN * Update doc + add benchmark * Add tests + update doc * Fix doc * Improve names for conjugate gradient * Update comments * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org>		2021-12-29 11:58:03 +01:00
..
_static	Add TQC (#4 )	2020-10-22 13:43:46 +02:00
common	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00
guide	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00
images	PPO variant with invalid action masking (#25 )	2021-09-23 14:50:10 +02:00
misc	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00
modules	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00
Makefile	Add base doc	2020-10-12 20:21:52 +02:00
README.md	Review docs and update changelog	2020-10-15 02:17:36 +03:00
conda_env.yml	Drop python 3.6 support (#55 )	2021-12-06 12:59:53 +01:00
conf.py	Add base doc	2020-10-12 20:21:52 +02:00
index.rst	Add Trust Region Policy Optimization (TRPO) (#40 )	2021-12-29 11:58:03 +01:00
make.bat	Add base doc	2020-10-12 20:21:52 +02:00
spelling_wordlist.txt	Add base doc	2020-10-12 20:21:52 +02:00

README.md

Stable Baselines3 Contrib Documentation

This folder contains documentation for the RL baselines contribution repository.

Build the Documentation

Install Sphinx and Theme

pip install sphinx sphinx-autobuild sphinx-rtd-theme

Building the Docs

In the docs/ folder:

make html

if you want to building each time a file is changed:

sphinx-autobuild . _build/html