Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python RLLib调谐PPOTrainer,但不调谐A2CTrainer_Python_Reinforcement Learning_Ray_Rllib - Fatal编程技术网

Python RLLib调谐PPOTrainer,但不调谐A2CTrainer

Python RLLib调谐PPOTrainer,但不调谐A2CTrainer,python,reinforcement-learning,ray,rllib,Python,Reinforcement Learning,Ray,Rllib,我正在对CartPole环境下的两种算法进行比较。具有以下特征: import ray from ray import tune from ray.rllib import agents ray.init() # Skip or set to ignore if already called 运行此功能非常有效: experiment = tune.run( agents.ppo.PPOTrainer, config={ "env": &qu

我正在对CartPole环境下的两种算法进行比较。具有以下特征:

import ray
from ray import tune
from ray.rllib import agents
ray.init() # Skip or set to ignore if already called
运行此功能非常有效:

experiment = tune.run(
    agents.ppo.PPOTrainer,
    config={
        "env": "CartPole-v1",
        "num_gpus": 1,
        "num_workers": 0,
        "num_envs_per_worker": 50,
        "rollout_fragment_length": 100,
        "train_batch_size": 5000,
        "sgd_minibatch_size": 500,
        "num_sgd_iter": 10,
        "entropy_coeff": 0.01,
        "lr_schedule": [
              [0, 0.0005],
              [10000000, 0.000000000001],
        ],
        "lambda": 0.95,
        "kl_coeff": 0.5,
        "clip_param": 0.1,
        "vf_share_layers": False,
    },
    metric="episode_reward_mean",
    mode="max",
    stop={"training_iteration": 100},
    checkpoint_at_end=True,
)
但当我对A2C代理执行相同操作时:

experiment = tune.run(
    agents.a3c.A2CTrainer,
    config={
        "env": "CartPole-v1",
        "num_gpus": 1,
        "num_workers": 0,
        "num_envs_per_worker": 50,
        "rollout_fragment_length": 100,
        "train_batch_size": 5000,
        "sgd_minibatch_size": 500,
        "num_sgd_iter": 10,
        "entropy_coeff": 0.01,
        "lr_schedule": [
              [0, 0.0005],
              [10000000, 0.000000000001],
        ],
        "lambda": 0.95,
        "kl_coeff": 0.5,
        "clip_param": 0.1,
        "vf_share_layers": False,
    },
    metric="episode_reward_mean",
    mode="max",
    stop={"training_iteration": 100},
    checkpoint_at_end=True,
)
它返回此异常:

---------------------------------------------------------------------------
TuneError                                 Traceback (most recent call last)
<ipython-input-9-6680e67f9343> in <module>()
     23     mode="max",
     24     stop={"training_iteration": 100},
---> 25     checkpoint_at_end=True,
     26 )

/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, loggers, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, queue_trials, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, ray_auto_init, run_errored_only, global_checkpoint_period, with_server, upload_dir, sync_to_cloud, sync_to_driver, sync_on_checkpoint)
    432     if incomplete_trials:
    433         if raise_on_failed_trial:
--> 434             raise TuneError("Trials did not complete", incomplete_trials)
    435         else:
    436             logger.error("Trials did not complete: %s", incomplete_trials)

TuneError: ('Trials did not complete', [A2C_CartPole-v1_6acda_00000])
---------------------------------------------------------------------------
TuneError回溯(最近一次呼叫上次)
在()
23 mode=“max”,
24停止={“训练迭代”:100},
--->25检查点\u在\u端=真,
26 )
/运行中的usr/local/lib/python3.6/dist-packages/ray/tune/tune.py(运行或实验、名称、度量、模式、停止、时间预算、配置、每次试验的资源、数量样本、本地目录、搜索目录、调度程序、保留检查点数量、检查点分数属性、检查点频率、结束时的检查点、详细信息、进度报告器、记录器、日志文件、试验名称创建者、试验名称创建者、同步配置、导出格式、最大失败)res、fail_fast、restore、server_port、resume、queue_trials、reuse_actors、trial_executor、raise_on_failed_trial、callbacks、ray_auto_init、run_errored_only、global_checkpoint_period、with_server、upload_dir、sync_to_cloud、sync_to_to_driver、sync_on_on_checkpoint)
432如果试验不完整:
433如果在试验失败时提出:
-->434错误(“试验未完成”,试验未完成)
435其他:
436记录器。错误(“试验未完成:%s”,试验未完成)
TuneError:(“试验未完成,[A2C_CartPole-v1_6acda_uu00000])

有人能告诉我发生了什么事吗?我不知道这是否与我正在使用的库的版本有关,或者我的代码有问题。这是一个常见问题吗?

A2C代码失败,因为您从PPO试用版复制了配置:“sgd\u minibatch\u size”、“kl\u coeff”还有许多是特定于PPO的配置,这在使用A2C运行时会导致问题

logdir中的“error.txt”中解释了该错误