我已经看到了用于具有离散动作空间的强化学习任务的强化策略算法的多个实现。是否有针对连续动作空间的算法(或其他策略梯度算法)的实现?
更具体地说,有没有可能从OpenAI健身房实现两足动物运动的增强-“人形-v2”?
谢谢。
发布于 2018-12-12 10:52:41
你可以稳定的基线包:https://github.com/hill-a/stable-baselines
培训一个座席就像这样简单:
import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2
my_env_id = 'Humanoid-v2'
env = gym.make(my_env_id)
# Vectorized environments allow to easily multiprocess training
# we demonstrate its usefulness in the next examples
env = DummyVecEnv([lambda: env]) # The algorithms require a vectorized environment to run
model = PPO2(MlpPolicy, env, verbose=1)
# Train the agent
model.learn(total_timesteps=10000)
# Enjoy trained agent
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()https://stackoverflow.com/questions/49804489
复制相似问题