Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Spark 2.0 toPandas方法_Python_Apache Spark_Pyspark - Fatal编程技术网

Python Spark 2.0 toPandas方法

Python Spark 2.0 toPandas方法,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我有一个spark数据框,如下所示: topics.show(2) +-----+--------------------+--------------------+--------------------+ |topic| termIndices| termWeights| topics_words| +-----+--------------------+--------------------+--------------------+ |

我有一个spark数据框,如下所示:

topics.show(2)
+-----+--------------------+--------------------+--------------------+
|topic|         termIndices|         termWeights|        topics_words|
+-----+--------------------+--------------------+--------------------+
|    0|[0, 39, 68, 43, 5...|[0.06362107696025...|[, management, sa...|
|    1|[3, 1, 8, 6, 4, 1...|[0.03164821806301...|[objectives, lear...|
+-----+--------------------+--------------------+--------------------+
only showing top 2 rows
然而,当我尝试使用1.6中的以下方法转换为熊猫数据帧时,我得到了一个错误

topics.toPandas()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-165-4c1231b68769> in <module>()
----> 1 topics.toPandas()

/Users/i854319/spark2/python/pyspark/sql/dataframe.pyc in toPandas(self)
   1440         """
   1441         import pandas as pd
-> 1442         return pd.DataFrame.from_records(self.collect(), columns=self.columns)
   1443 
   1444     ##########################################################################################

/Users/i854319/spark2/python/pyspark/sql/dataframe.pyc in collect(self)
    307         [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
    308         """
--> 309         with SCCallSiteSync(self._sc) as css:
    310             port = self._jdf.collectToPython()
    311         return list(_load_from_socket(port, BatchedSerializer(PickleSerializer())))

/Users/i854319/spark2/python/pyspark/traceback_utils.pyc in __enter__(self)
     70     def __enter__(self):
     71         if SCCallSiteSync._spark_stack_depth == 0:
---> 72             self._context._jsc.setCallSite(self._call_site)
     73         SCCallSiteSync._spark_stack_depth += 1
     74 

AttributeError: 'NoneType' object has no attribute 'setCallSite'
topics.toPandas()
---------------------------------------------------------------------------
AttributeError回溯(最近一次呼叫上次)
在()
---->1.主题toPandas()
/toPandas(self)中的Users/i854319/spark2/python/pyspark/sql/dataframe.pyc
1440         """
1441只大熊猫作为pd进口
->1442从_记录返回pd.DataFrame.from(self.collect(),columns=self.columns)
1443
1444     ##########################################################################################
/collect(self)中的Users/i854319/spark2/python/pyspark/sql/dataframe.pyc
307[行(年龄=2,姓名=u'Alice'),行(年龄=5,姓名=u'Bob')]
308         """
-->309使用SCCallSiteSync(self.\u sc)作为css:
310端口=self.\u jdf.collectToPython()
311返回列表(\u从\u套接字加载\u(端口,BatchedSerializer(PickleSerializer()))
/Users/i854319/spark2/python/pyspark/traceback_utils.pyc in_uuu_u输入(self)
70定义输入(自我):
71如果SCCallSiteSync.\u spark\u stack\u depth==0:
--->72 self.\u context.\u jsc.setCallSite(self.\u call\u site)
73 SCCallSiteSync.\u spark\u stack\u depth+=1
74
AttributeError:“非类型”对象没有属性“setCallSite”
因此,不确定Spark 2.0.2中的这种方法是否存在缺陷,或者是否出了问题

复制我的:

关于这一点,有一个悬而未决的问题:

海报建议强制将DF的后端与Spark上下文同步:

df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
df._sc = spark._sc

这对我们很有效,希望在其他情况下也能起作用。

您有没有找到解决此问题的方法?没有。现在失败了。您有解决方案吗?我在pyspark 2.2.1中遇到了相同的问题,我必须更改为2.2.0。一切都很好@谢谢你!这就是问题所在。