Apache spark PySpark Standalone:java.lang.IllegalStateException:未读块数据
我对使用pyspark相当陌生,我一直在尝试运行一个脚本,该脚本在本地模式下对1000行数据子集运行良好,但现在在独立模式下对所有数据抛出错误,即1GB。我认为这会随着更多的数据=更多的问题而发生,但我很难理解是什么导致了这个问题。以下是我的独立群集的详细信息:Apache spark PySpark Standalone:java.lang.IllegalStateException:未读块数据,apache-spark,pyspark,spark-dataframe,Apache Spark,Pyspark,Spark Dataframe,我对使用pyspark相当陌生,我一直在尝试运行一个脚本,该脚本在本地模式下对1000行数据子集运行良好,但现在在独立模式下对所有数据抛出错误,即1GB。我认为这会随着更多的数据=更多的问题而发生,但我很难理解是什么导致了这个问题。以下是我的独立群集的详细信息: 3遗嘱执行人 每个内存为20GB spark.driver.maxResultSize=1GB(添加了这个bc,我认为这可能是问题所在,但它没有解决问题) 脚本在我将spark数据帧转换为pandas数据帧以并行化某些操作的阶段抛出
- 3遗嘱执行人
- 每个内存为20GB
- spark.driver.maxResultSize=1GB(添加了这个bc,我认为这可能是问题所在,但它没有解决问题)
data=data.toPandas()
:
因此,我的问题是:
对于那些可能觉得这篇文章有用的人来说——问题似乎不是给工人/奴隶更多的记忆,而是给司机更多的记忆,正如@kartikkanapur在评论中提到的那样。为了解决这个问题,我设置:
spark.driver.maxResultSize 3g
spark.driver.memory 8g
spark.executor.memory 4g
可能是杀伤力过大,但它现在可以完成任务。虽然您的集群有20GB的内存,但您是否明确地将
spark.driver.memory
和spark.executor.memory
设置为1GB以上?他们的默认值是1GB,你可以试着设置一个更大的值。似乎这是允许它工作-谢谢!很高兴这起作用了。我更详细地回答了一个类似的问题,我的方法是将这些作为参数传递给pyspark命令:pyspark——driver memory 5G——executor memory 10G
16/07/11 09:49:54 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:109)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:258)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:310)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:257)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:256)
at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:588)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:577)
at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:170)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:104)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
spark.driver.maxResultSize 3g
spark.driver.memory 8g
spark.executor.memory 4g