Hadoop 如何使用“指定多个文件?”-档案;在Amazon的CLI中进行EMR?

Hadoop 如何使用“指定多个文件?”-档案;在Amazon的CLI中进行EMR?,hadoop,amazon-web-services,amazon-emr,aws-cli,Hadoop,Amazon Web Services,Amazon Emr,Aws Cli,我试图通过amazoncli启动amazon集群,但我有点困惑如何指定多个文件。我目前的电话如下: aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,- files,s3://betaestimationtest/reducer.py,-mappe

我试图通过amazoncli启动amazon集群,但我有点困惑如何指定多个文件。我目前的电话如下:

aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,-
files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-
input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra] 
--ami-version 3.1.0 
--instance-groupsInstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge 
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate 
--log-uri s3://betaestimationtest/logs
但是,Hadoop现在抱怨找不到reducer文件:

Caused by: java.io.IOException: Cannot run program "reducer.py": error=2, No such file or directory

我做错了什么?文件确实存在于我指定的文件夹中

您正在指定的-files中两次,您只需要指定一次。我忘记了CLI是否需要分隔符作为多个值的空格或逗号,但您可以尝试一下

您应该替换:

Args=[-files,s3://betaestimationtest/mapper.py,-files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
与:

或者,如果失败,则:

Args=[-files,s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

您将指定-files两次,只需指定一次。我忘记了CLI是否需要分隔符作为多个值的空格或逗号,但您可以尝试一下

您应该替换:

Args=[-files,s3://betaestimationtest/mapper.py,-files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
与:

或者,如果失败,则:

Args=[-files,s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

要在流式处理步骤中传递多个文件,需要使用file://将步骤作为json文件传递

AWS CLI速记语法使用逗号作为分隔参数列表的分隔符。因此,当我们尝试传入参数时,如:“-files”、“s3://betaestimationtest/mapper.py、s3://betaestimationtest/reducer.py”,那么速记语法解析器将把mapper.py和reducer.py文件视为两个参数

解决方法是使用json格式。请看下面的例子

aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs
mysteps.json看起来像:

[
    {
    "Name": "Intra country development",
    "Type": "STREAMING",
    "ActionOnFailure": "CONTINUE",
    "Args": [
        "-files",
        "s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
        "-mapper",
        "mapper.py",
        "-reducer",
        "reducer.py",
        "-input",
        " s3://betaestimationtest/output_0_inte",
        "-output",
        " s3://betaestimationtest/output_1_intra"
    ]}
]
您还可以在此处找到示例:。参见示例13


希望有帮助

要在流式处理步骤中传递多个文件,需要使用file://将步骤作为json文件传递

AWS CLI速记语法使用逗号作为分隔参数列表的分隔符。因此,当我们尝试传入参数时,如:“-files”、“s3://betaestimationtest/mapper.py、s3://betaestimationtest/reducer.py”,那么速记语法解析器将把mapper.py和reducer.py文件视为两个参数

解决方法是使用json格式。请看下面的例子

aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs
mysteps.json看起来像:

[
    {
    "Name": "Intra country development",
    "Type": "STREAMING",
    "ActionOnFailure": "CONTINUE",
    "Args": [
        "-files",
        "s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
        "-mapper",
        "mapper.py",
        "-reducer",
        "reducer.py",
        "-input",
        " s3://betaestimationtest/output_0_inte",
        "-output",
        " s3://betaestimationtest/output_1_intra"
    ]}
]
您还可以在此处找到示例:。参见示例13


希望有帮助

为逗号分隔文件添加转义符:

    Args=[-files,s3://betaestimationtest/mapper.py\\,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

为逗号分隔文件添加转义:

    Args=[-files,s3://betaestimationtest/mapper.py\\,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]

实际上,我已经尝试了这两种选择。您建议的第一件事会在我的控制台中导致以下错误:
键值对,其中值用逗号分隔,多个对用空格分隔。
第二个选项在Amazon中也不起作用,并给出以下错误:
在命令行上找到1个意外参数[s3://betaestimationtest/intraccountryreducer.py]
实际上我已经尝试了这两个选项。您建议的第一件事会在我的控制台中导致以下错误:
键值对,其中值用逗号分隔,多个对用空格分隔。
第二个选项在Amazon中也不起作用,并给出以下错误:
发现1个意外的参数命令行[s3://betaestimationtest/intraccountryreducer.py]
上的uments不起作用。我得到了此错误:在命令行[s3://str emr/reduce.rb]上发现1个意外参数Try-help获取更多信息流命令失败!使用ret“1”退出的命令无效。我遇到此错误:在命令行[s3://str emr/reduce.rb]上找到1个意外参数Try-help获取更多信息流命令失败!使用ret“1”退出的命令