Fix TRPO doc
This commit is contained in:
parent
59be198da0
commit
3b007ae93b
|
|
@ -45,12 +45,12 @@ Train a PPO with invalid action masking agent on a toy environment.
|
|||
model.learn(5000)
|
||||
model.save("qrdqn_cartpole")
|
||||
|
||||
TRPO
|
||||
----
|
||||
TRPO
|
||||
----
|
||||
|
||||
Train a Trust Region Policy Optimization (TRPO) agent on the Pendulum environment.
|
||||
Train a Trust Region Policy Optimization (TRPO) agent on the Pendulum environment.
|
||||
|
||||
.. code-block:: python
|
||||
.. code-block:: python
|
||||
|
||||
from sb3_contrib import TRPO
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue