Hadoop 如何使用“指定多个文件？”-档案；在Amazon的CLI中进行EMR？_Hadoop_Amazon Web Services_Amazon Emr_Aws Cli

Hadoop 如何使用“指定多个文件？”-档案；在Amazon的CLI中进行EMR？

hadoop amazon-web-services

Hadoop 如何使用“指定多个文件？”-档案；在Amazon的CLI中进行EMR？,hadoop,amazon-web-services,amazon-emr,aws-cli,Hadoop,Amazon Web Services,Amazon Emr,Aws Cli,我试图通过amazoncli启动amazon集群，但我有点困惑如何指定多个文件。我目前的电话如下： aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,- files,s3://betaestimationtest/reducer.py,-mappe

我试图通过amazoncli启动amazon集群，但我有点困惑如何指定多个文件。我目前的电话如下：

aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,-
files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-
input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra] 
--ami-version 3.1.0 
--instance-groupsInstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge 
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate 
--log-uri s3://betaestimationtest/logs

但是，Hadoop现在抱怨找不到reducer文件：

Caused by: java.io.IOException: Cannot run program "reducer.py": error=2, No such file or directory

我做错了什么？文件确实存在于我指定的文件夹中

您正在指定的-files中两次，您只需要指定一次。我忘记了CLI是否需要分隔符作为多个值的空格或逗号，但您可以尝试一下

您应该替换：

Args=[-files,s3://betaestimationtest/mapper.py,-files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

与：

或者，如果失败，则：

Args=[-files,s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

您将指定-files两次，只需指定一次。我忘记了CLI是否需要分隔符作为多个值的空格或逗号，但您可以尝试一下

您应该替换：

Args=[-files,s3://betaestimationtest/mapper.py,-files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

与：

或者，如果失败，则：

Args=[-files,s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

要在流式处理步骤中传递多个文件，需要使用file://将步骤作为json文件传递

AWS CLI速记语法使用逗号作为分隔参数列表的分隔符。因此，当我们尝试传入参数时，如：“-files”、“s3://betaestimationtest/mapper.py、s3://betaestimationtest/reducer.py”，那么速记语法解析器将把mapper.py和reducer.py文件视为两个参数

解决方法是使用json格式。请看下面的例子

aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs

mysteps.json看起来像：

[
    {
    "Name": "Intra country development",
    "Type": "STREAMING",
    "ActionOnFailure": "CONTINUE",
    "Args": [
        "-files",
        "s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
        "-mapper",
        "mapper.py",
        "-reducer",
        "reducer.py",
        "-input",
        " s3://betaestimationtest/output_0_inte",
        "-output",
        " s3://betaestimationtest/output_1_intra"
    ]}
]

您还可以在此处找到示例：。参见示例13

希望有帮助

要在流式处理步骤中传递多个文件，需要使用file://将步骤作为json文件传递

解决方法是使用json格式。请看下面的例子

aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs

mysteps.json看起来像：

[
    {
    "Name": "Intra country development",
    "Type": "STREAMING",
    "ActionOnFailure": "CONTINUE",
    "Args": [
        "-files",
        "s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
        "-mapper",
        "mapper.py",
        "-reducer",
        "reducer.py",
        "-input",
        " s3://betaestimationtest/output_0_inte",
        "-output",
        " s3://betaestimationtest/output_1_intra"
    ]}
]

您还可以在此处找到示例：。参见示例13

希望有帮助

为逗号分隔文件添加转义符：

    Args=[-files,s3://betaestimationtest/mapper.py\\,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

为逗号分隔文件添加转义：

    Args=[-files,s3://betaestimationtest/mapper.py\\,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

实际上，我已经尝试了这两种选择。您建议的第一件事会在我的控制台中导致以下错误：

键值对，其中值用逗号分隔，多个对用空格分隔。

第二个选项在Amazon中也不起作用，并给出以下错误：

在命令行上找到1个意外参数[s3://betaestimationtest/intraccountryreducer.py]

实际上我已经尝试了这两个选项。您建议的第一件事会在我的控制台中导致以下错误：

键值对，其中值用逗号分隔，多个对用空格分隔。

第二个选项在Amazon中也不起作用，并给出以下错误：

发现1个意外的参数命令行[s3://betaestimationtest/intraccountryreducer.py]

上的uments不起作用。我得到了此错误：在命令行[s3://str emr/reduce.rb]上发现1个意外参数Try-help获取更多信息流命令失败！使用ret“1”退出的命令无效。我遇到此错误：在命令行[s3://str emr/reduce.rb]上找到1个意外参数Try-help获取更多信息流命令失败！使用ret“1”退出的命令