Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python AWS-EMR错误退出代码143_Python_Apache Spark_Pyspark_Yarn_Amazon Emr - Fatal编程技术网

Python AWS-EMR错误退出代码143

Python AWS-EMR错误退出代码143,python,apache-spark,pyspark,yarn,amazon-emr,Python,Apache Spark,Pyspark,Yarn,Amazon Emr,我正在对AWS EMR进行分析,发现一个意外的SIGTERM错误 一些背景: 我正在运行一个脚本,读取存储在S3上的许多csv文件,然后执行分析。我的脚本示意图如下: 分析脚本.py import pandas as pd from pyspark.sql import SQLContext, DataFrame from pyspark.sql.types import * from pyspark import SparkContext import boto3 #Spark contex

我正在对AWS EMR进行分析,发现一个意外的SIGTERM错误

一些背景:

我正在运行一个脚本,读取存储在S3上的许多csv文件,然后执行分析。我的脚本示意图如下:

分析脚本.py

import pandas as pd
from pyspark.sql import SQLContext, DataFrame
from pyspark.sql.types import *
from pyspark import SparkContext
import boto3

#Spark context
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)

df = sqlContext.read.csv("s3n://csv_files/*", header = True)

def analysis(df):
    #do bunch of stuff. Create output dataframe
    return df_output

df_output = analysis(df)
我使用以下方法启动群集:

aws emr create-cluster 
--release-label emr-5.5.0 
--name "Analysis" 
--applications Name=Hadoop Name=Hive Name=Spark  Name=Ganglia  
--ec2-attributes KeyName=EMRB,InstanceProfile=EMR_EC2_DefaultRole 
--service-role EMR_DefaultRole 
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=r3.xlarge InstanceGroupType=CORE,InstanceCount=4,InstanceType=r3.xlarge 
--region us-west-2 
--log-uri s3://emr-logs/ 
--bootstrap-actions Name="Install Python Packages",Path="s3://emr-bootstraps/install_python_packages_custom.bash",Args=["numpy pandas boto3 tqdm"] 
--auto-terminate
我可以从日志中看到,csv文件的读入进展顺利。但随后它以错误结束。以下行位于stderr文件中:

18/07/16 12:02:26 ERROR ApplicationMaster: RECEIVED SIGNAL TERM
18/07/16 12:02:26 ERROR ApplicationMaster: User application exited with status 143
18/07/16 12:02:26 INFO ApplicationMaster: Final app status: FAILED, exitCode: 143, (reason: User application exited with status 143)
18/07/16 12:02:26 INFO SparkContext: Invoking stop() from shutdown hook
18/07/16 12:02:26 INFO SparkUI: Stopped Spark web UI at http://172.31.36.42:36169
18/07/16 12:02:26 INFO TaskSetManager: Starting task 908.0 in stage 1494.0 (TID 88112, ip-172-31-35-59.us-west-2.compute.internal, executor 27, partition 908, RACK_LOCAL, 7278 bytes)
18/07/16 12:02:26 INFO TaskSetManager: Finished task 874.0 in stage 1494.0 (TID 88078) in 16482 ms on ip-172-31-35-59.us-west-2.compute.internal (executor 27) (879/4805)
18/07/16 12:02:26 INFO BlockManagerInfo: Added broadcast_2328_piece0 in memory on ip-172-31-36-42.us-west-2.compute.internal:34133 (size: 28.8 KB, free: 2.8 GB)
18/07/16 12:02:26 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(20, ip-172-31-36-42.us-west-2.compute.internal, 34133, None),broadcast_2328_piece0,StorageLevel(memory, 1 replicas),29537,0))
18/07/16 12:02:26 INFO BlockManagerInfo: Added broadcast_2328_piece0 in memory on ip-172-31-47-55.us-west-2.compute.internal:45758 (size: 28.8 KB, free: 2.8 GB)
18/07/16 12:02:26 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, ip-172-31-47-55.us-west-2.compute.internal, 45758, None),broadcast_2328_piece0,StorageLevel(memory, 1 replicas),29537,0))
18/07/16 12:02:26 INFO DAGScheduler: Job 1494 failed: toPandas at analysis_script.py:267, took 479.895614 s
18/07/16 12:02:26 INFO DAGScheduler: ShuffleMapStage 1494 (toPandas at analysis_script.py:267) failed in 478.993 s due to Stage cancelled because SparkContext was shut down
18/07/16 12:02:26 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerSQLExecutionEnd(0,1531742546839)
18/07/16 12:02:26 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@28e5b10c)
18/07/16 12:02:26 INFO DAGScheduler: ShuffleMapStage 1495 (toPandas at analysis_script.py:267) failed in 479.270 s due to Stage cancelled because SparkContext was shut down
18/07/16 12:02:26 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@6b68c419)
18/07/16 12:02:26 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(1494,1531742546841,JobFailed(org.apache.spark.SparkException: Job 1494 cancelled because SparkContext was shut down))
18/07/16 12:02:26 INFO YarnAllocator: Driver requested a total number of 0 executor(s).
18/07/16 12:02:26 INFO YarnClusterSchedulerBackend: Shutting down all executors
18/07/16 12:02:26 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
18/07/16 12:02:26 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices(serviceOption=None, services=List(),started=false)
18/07/16 12:02:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 

我找不到很多关于出口代码143的有用信息。有人知道为什么会发生这种错误吗?谢谢。

当退出代码超过128时,Spark会通过退出代码,JVM错误通常就是这样。在退出代码143的情况下,它表示JVM接收到一个
SIGTERM
——本质上是一个unix终止信号()。有关火花出口代码的其他详细信息


既然你自己没有终止这件事,我就开始怀疑其他事情是外部造成的。考虑到从作业开始到发出
SIGTERM
之间正好经过8分钟,EMR本身似乎更有可能强制执行最大作业运行时间/群集期限。尝试检查您的EMR设置,看看是否有这样的超时设置-在我的案例中有一个(在AWS Glue上,但概念相同)。

您是否尝试过更新版本的EMR或查看Spark UI?。。。请显示“do bunch of stuff”代码,因为我猜调用
toPandas
会杀死您的执行器,因为它在大约500秒后内存不足。本质上,我在这个大火花数据帧的所有列上运行聚合函数,这会缩小大小,然后我调用toPandas将其缩小到什么大小,尽管如此?这个大小比执行器内存大吗?我不明白为什么会出现“toPandas”这个问题?代码类似于'df.agg.(“sum”).toPandas()。导致问题的不是agg函数而不是toPandas吗?因为toPandas正在将每个RDD分区下载到一台机器中。agg函数本身是一个惰性操作