Memory pyspark中的从机丢失错误
我用的是Spark1.6 我正在运行一个简单的df.show(2)方法,并得到如下错误Memory pyspark中的从机丢失错误,memory,pyspark,out,Memory,Pyspark,Out,我用的是Spark1.6 我正在运行一个简单的df.show(2)方法,并得到如下错误 An error occurred while calling o143.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 6.0 failed 4 times, most recent failure: Lost tas
An error occurred while calling o143.showString.
: org.apache.spark.SparkException: Job aborted due to stage
failure: Task 6 in stage 6.0 failed 4 times, most recent failure:
Lost task 6.3 in stage 6.0
ExecutorLostFailure (executor 2 exited caused by one of the
running tasks) Reason: Slave lost
当我坚持的时候,通过spark UI我看到shuffleWrite内存非常高,花了很长时间,仍然返回错误。
通过一些搜索,我发现这些可能是内存不足的问题。
遵循此链接
我重新划分了1000个分区,但仍然没有多大帮助
我把SparkConf设为
conf = (SparkConf().set("spark.driver.maxResultSize", "150g").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"))
我的服务器端内存可能高达200GB
你有什么好主意来做这件事,或指向我的相关链接。Pypark将非常有用
以下是来自纱线的错误日志:
Application application_1477088172315_0118 failed 2 times due to
AM Container for appattempt_1477088172315_0118_000006 exited
with exitCode: 10
For more detailed output, check application tracking page: Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1477088172315_0118_06_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
以下是笔记本中的错误信息:
Py4JJavaError:调用o71.showString时出错。
:org.apache.spark.sparkeexception:作业因阶段失败而中止:阶段15.0中的任务1失败4次,最近的失败:阶段15.0中的任务1.3丢失():ExecutorLostFailure(由于某个正在运行的任务导致executor 26退出)原因:从机丢失
驱动程序堆栈跟踪:
位于org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
位于org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
位于org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
位于scala.collection.mutable.resizeblearray$class.foreach(resizeblearray.scala:59)
位于scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
位于org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
位于org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
位于org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
位于scala.Option.foreach(Option.scala:236)
位于org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
位于org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
位于org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
位于org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
位于org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
位于org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
位于org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
位于org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
位于org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
位于org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:212)
位于org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
位于org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
位于org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
位于org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
位于org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
位于org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
位于org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
位于org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
位于org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
位于org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374)
位于org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
位于org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)
位于org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)
位于org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
位于java.lang.reflect.Method.invoke(Method.java:498)
位于py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
位于py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
在py4j.Gateway.invoke处(Gateway.java:259)
位于py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
在py4j.commands.CallCommand.execute(CallCommand.java:79)
在py4j.GatewayConnection.run处(GatewayConnection.java:209)
运行(Thread.java:745)
谢谢你你能提供死执行者的Hadoop容器日志吗?Stdout和stderr将很有用。@Mariusz刚刚从spark UIT添加了错误日志这是纱线日志,那么executor进程的日志呢?“stdout和stderr?”Mariusz现在我的笔记本卡在那里了。当我执行df=df1.join(df2),然后执行df.show()时,就会发生这种情况。。甚至我也能做df1.show()和df2.show()