Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python pyspark:获取rdd.tolocaterator()上的超时_Python_Python 2.7_Pyspark - Fatal编程技术网

Python pyspark:获取rdd.tolocaterator()上的超时

Python pyspark:获取rdd.tolocaterator()上的超时,python,python-2.7,pyspark,Python,Python 2.7,Pyspark,我在尝试迭代rdd时遇到超时。使用.toLocalIterator()从数据帧创建 数据集很大 Pyspark 2.0和python 2.7 ----> 2 for dataRow in dataFrame.select(['uid', fieldName]).rdd.toLocalIterator(): 3 if isinstance(dataRow[fieldName], DenseVector): ... /srv/software/spark-

我在尝试迭代rdd时遇到超时。使用.toLocalIterator()从数据帧创建

数据集很大

Pyspark 2.0和python 2.7

----> 2     for dataRow in dataFrame.select(['uid', fieldName]).rdd.toLocalIterator():
      3         if isinstance(dataRow[fieldName], DenseVector):
...

/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/rdd.py in _load_from_socket(port, serializer)
--> 142         for item in serializer.load_stream(rf):

/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py in load_stream(self, stream)
--> 139                 yield self._read_with_length(stream)

/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py in _read_with_length(self, stream)
--> 156         length = read_int(stream)

/srv/software/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py in read_int(stream)
--> 543     length = stream.read(4)

/home/pcardoso/.conda/envs/libV2/lib/python2.7/socket.pyc in read(self, size)
--> 384                     data = self._sock.recv(left)

timeout: timed out

嗯,正如你所说:它很大。如果你在纱线客户机工作,这是正常的。您可以尝试使用纱线簇并使用spark参数。请参阅配置/调整页面我使用
mesos
和spark2得到了相同的错误。我将使用哪个参数。有很多!