Hadoop 如何使用“指定多个文件?”-档案;在Amazon的CLI中进行EMR?
我试图通过amazoncli启动amazon集群,但我有点困惑如何指定多个文件。我目前的电话如下:Hadoop 如何使用“指定多个文件?”-档案;在Amazon的CLI中进行EMR?,hadoop,amazon-web-services,amazon-emr,aws-cli,Hadoop,Amazon Web Services,Amazon Emr,Aws Cli,我试图通过amazoncli启动amazon集群,但我有点困惑如何指定多个文件。我目前的电话如下: aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,- files,s3://betaestimationtest/reducer.py,-mappe
aws emr create-cluster --steps Type=STREAMING,Name='Intra country development',ActionOnFailure=CONTINUE,Args=[-files,s3://betaestimationtest/mapper.py,-
files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-
input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
--ami-version 3.1.0
--instance-groupsInstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate
--log-uri s3://betaestimationtest/logs
但是,Hadoop现在抱怨找不到reducer文件:
Caused by: java.io.IOException: Cannot run program "reducer.py": error=2, No such file or directory
我做错了什么?文件确实存在于我指定的文件夹中您正在指定的-files中两次,您只需要指定一次。我忘记了CLI是否需要分隔符作为多个值的空格或逗号,但您可以尝试一下 您应该替换:
Args=[-files,s3://betaestimationtest/mapper.py,-files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
与:
或者,如果失败,则:
Args=[-files,s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
您将指定-files两次,只需指定一次。我忘记了CLI是否需要分隔符作为多个值的空格或逗号,但您可以尝试一下 您应该替换:
Args=[-files,s3://betaestimationtest/mapper.py,-files,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
与:
或者,如果失败,则:
Args=[-files,s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
要在流式处理步骤中传递多个文件,需要使用file://将步骤作为json文件传递 AWS CLI速记语法使用逗号作为分隔参数列表的分隔符。因此,当我们尝试传入参数时,如:“-files”、“s3://betaestimationtest/mapper.py、s3://betaestimationtest/reducer.py”,那么速记语法解析器将把mapper.py和reducer.py文件视为两个参数 解决方法是使用json格式。请看下面的例子
aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs
mysteps.json看起来像:
[
{
"Name": "Intra country development",
"Type": "STREAMING",
"ActionOnFailure": "CONTINUE",
"Args": [
"-files",
"s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
"-mapper",
"mapper.py",
"-reducer",
"reducer.py",
"-input",
" s3://betaestimationtest/output_0_inte",
"-output",
" s3://betaestimationtest/output_1_intra"
]}
]
您还可以在此处找到示例:。参见示例13
希望有帮助 要在流式处理步骤中传递多个文件,需要使用file://将步骤作为json文件传递 AWS CLI速记语法使用逗号作为分隔参数列表的分隔符。因此,当我们尝试传入参数时,如:“-files”、“s3://betaestimationtest/mapper.py、s3://betaestimationtest/reducer.py”,那么速记语法解析器将把mapper.py和reducer.py文件视为两个参数 解决方法是使用json格式。请看下面的例子
aws emr create-cluster --steps file://./mysteps.json --ami-version 3.1.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --auto-terminate --log-uri s3://betaestimationtest/logs
mysteps.json看起来像:
[
{
"Name": "Intra country development",
"Type": "STREAMING",
"ActionOnFailure": "CONTINUE",
"Args": [
"-files",
"s3://betaestimationtest/mapper.py,s3://betaestimationtest/reducer.py",
"-mapper",
"mapper.py",
"-reducer",
"reducer.py",
"-input",
" s3://betaestimationtest/output_0_inte",
"-output",
" s3://betaestimationtest/output_1_intra"
]}
]
您还可以在此处找到示例:。参见示例13
希望有帮助 为逗号分隔文件添加转义符:
Args=[-files,s3://betaestimationtest/mapper.py\\,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
为逗号分隔文件添加转义:
Args=[-files,s3://betaestimationtest/mapper.py\\,s3://betaestimationtest/reducer.py,-mapper,mapper.py,-reducer,reducer.py,-input,s3://betaestimationtest/output_0_inter,-output,s3://betaestimationtest/output_1_intra]
实际上,我已经尝试了这两种选择。您建议的第一件事会在我的控制台中导致以下错误:
键值对,其中值用逗号分隔,多个对用空格分隔。
第二个选项在Amazon中也不起作用,并给出以下错误:在命令行上找到1个意外参数[s3://betaestimationtest/intraccountryreducer.py]
实际上我已经尝试了这两个选项。您建议的第一件事会在我的控制台中导致以下错误:键值对,其中值用逗号分隔,多个对用空格分隔。
第二个选项在Amazon中也不起作用,并给出以下错误:发现1个意外的参数命令行[s3://betaestimationtest/intraccountryreducer.py]
上的uments不起作用。我得到了此错误:在命令行[s3://str emr/reduce.rb]上发现1个意外参数Try-help获取更多信息流命令失败!使用ret“1”退出的命令无效。我遇到此错误:在命令行[s3://str emr/reduce.rb]上找到1个意外参数Try-help获取更多信息流命令失败!使用ret“1”退出的命令