Python pyspark:获取rdd.tolocaterator()上的超时
我在尝试迭代rdd时遇到超时。使用.toLocalIterator()从数据帧创建 数据集很大 Pyspark 2.0和python 2.7Python pyspark:获取rdd.tolocaterator()上的超时,python,python-2.7,pyspark,Python,Python 2.7,Pyspark,我在尝试迭代rdd时遇到超时。使用.toLocalIterator()从数据帧创建 数据集很大 Pyspark 2.0和python 2.7 ----> 2 for dataRow in dataFrame.select(['uid', fieldName]).rdd.toLocalIterator(): 3 if isinstance(dataRow[fieldName], DenseVector): ... /srv/software/spark-
----> 2 for dataRow in dataFrame.select(['uid', fieldName]).rdd.toLocalIterator():
3 if isinstance(dataRow[fieldName], DenseVector):
...
/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/rdd.py in _load_from_socket(port, serializer)
--> 142 for item in serializer.load_stream(rf):
/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py in load_stream(self, stream)
--> 139 yield self._read_with_length(stream)
/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py in _read_with_length(self, stream)
--> 156 length = read_int(stream)
/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py in read_int(stream)
--> 543 length = stream.read(4)
/home/pcardoso/.conda/envs/libV2/lib/python2.7/socket.pyc in read(self, size)
--> 384 data = self._sock.recv(left)
timeout: timed out
嗯,正如你所说:它很大。如果你在纱线客户机工作,这是正常的。您可以尝试使用纱线簇并使用spark参数。请参阅配置/调整页面我使用
mesos
和spark2得到了相同的错误。我将使用哪个参数。有很多!