Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Amazon web services 使用lambda函数创建带有spark step的AWS EMR群集失败,原因是;“本地文件不存在”;_Amazon Web Services_Apache Spark_Aws Lambda_Amazon Emr - Fatal编程技术网

Amazon web services 使用lambda函数创建带有spark step的AWS EMR群集失败,原因是;“本地文件不存在”;

Amazon web services 使用lambda函数创建带有spark step的AWS EMR群集失败,原因是;“本地文件不存在”;,amazon-web-services,apache-spark,aws-lambda,amazon-emr,Amazon Web Services,Apache Spark,Aws Lambda,Amazon Emr,我试图用一个Lambda函数,用一个Spark步骤来加速EMR集群 以下是我的lambda函数(python 2.7): 因此,脚本运行人员似乎不理解从S3获取.jar文件 感谢您的帮助……并非所有的EMR都是预构建的,能够从S3复制您的jar、脚本,因此您必须在引导步骤中完成: BootstrapActions=[ { 'Name': 'Install additional components', 'ScriptBootstrapAction': {

我试图用一个Lambda函数,用一个Spark步骤来加速EMR集群

以下是我的lambda函数(python 2.7):

因此,脚本运行人员似乎不理解从S3获取.jar文件


感谢您的帮助……

并非所有的EMR都是预构建的,能够从S3复制您的jar、脚本,因此您必须在引导步骤中完成:

BootstrapActions=[
    {
        'Name': 'Install additional components',
        'ScriptBootstrapAction': {
            'Path': code_dir + '/scripts' + '/emr_bootstrap.sh'
        }
    }
],
这就是我的引导程序的作用

#!/bin/bash
HADOOP="/home/hadoop"
BUCKET="s3://<yourbucket>/<path>"

# Sync jars libraries
aws s3 sync ${BUCKET}/jars/ ${HADOOP}/
aws s3 sync ${BUCKET}/scripts/ ${HADOOP}/

# Install python packages
sudo pip install --upgrade pip
sudo ln -s /usr/local/bin/pip /usr/bin/pip
sudo pip install psycopg2 numpy boto3 pythonds
#/bin/bash
HADOOP=“/home/HADOOP”
BUCKET=“s3://”
#同步jars库
aws s3同步${BUCKET}/jars/${HADOOP}/
aws s3同步${BUCKET}/scripts/${HADOOP}/
#安装python软件包
sudopip安装--升级pip
sudo ln-s/usr/local/bin/pip/usr/bin/pip
sudo pip安装psycopg2 numpy boto3 pythonds
然后可以像这样调用脚本和jar

 {
        'Name': 'START YOUR STEP',
        'ActionOnFailure': 'TERMINATE_CLUSTER',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': [
                "spark-submit", "--jars", ADDITIONAL_JARS,
                "--py-files", "/home/hadoop/modules.zip",
                "/home/hadoop/<your code>.py"
            ]
        }
    },
{
“名称”:“开始您的步骤”,
'ActionOnFailure':'TERMINATE_CLUSTER',
“HadoopJarStep”:{
'Jar':'command runner.Jar',
“Args”:[
“spark submit”、“--jars”、其他_jars、,
“--py文件”、“/home/hadoop/modules.zip”,
“/home/hadoop/.py”
]
}
},

我最终可以解决这个问题。主要问题是“应用程序”配置已损坏,它必须如下所示:

Applications=[{
       'Name': 'Spark'
    },
    {
       'Name': 'Hive'
    }],
最后的步骤包括:

   Steps=[{
            'Name': 'lsr-step1',
            'ActionOnFailure': 'TERMINATE_CLUSTER',
            'HadoopJarStep': {
                'Jar': 'command-runner.jar',
                 'Args': [
                     "spark-submit", "--class", "org.apache.spark.examples.SparkPi", 
                     "s3://support.elasticmapreduce/spark/1.2.0/spark-examples-1.2.0-hadoop2.4.0.jar", "10"
                 ]
            }
        }]
Applications=[{
       'Name': 'Spark'
    },
    {
       'Name': 'Hive'
    }],
   Steps=[{
            'Name': 'lsr-step1',
            'ActionOnFailure': 'TERMINATE_CLUSTER',
            'HadoopJarStep': {
                'Jar': 'command-runner.jar',
                 'Args': [
                     "spark-submit", "--class", "org.apache.spark.examples.SparkPi", 
                     "s3://support.elasticmapreduce/spark/1.2.0/spark-examples-1.2.0-hadoop2.4.0.jar", "10"
                 ]
            }
        }]