Python 两个Spark数据帧的并集_Python_Apache Spark

Python 两个Spark数据帧的并集

python apache-spark

Python 两个Spark数据帧的并集,python,apache-spark,Python,Apache Spark,我尝试在Python中的两个Spark数据帧之间进行联合，其中一个有时是空的，我做了一个测试if，以返回完整的数据帧。下面是一个小代码，例如，它返回一个错误： >>> from pyspark.sql.types import * >>> fulldataframe = [StructField("FIELDNAME_1",StringType(), True),StructField("FIELDNAME_2", StringType(), True),S

我尝试在Python中的两个Spark数据帧之间进行联合，其中一个有时是空的，我做了一个测试if，以返回完整的数据帧。下面是一个小代码，例如，它返回一个错误：

>>> from pyspark.sql.types import *
>>> fulldataframe = [StructField("FIELDNAME_1",StringType(), True),StructField("FIELDNAME_2", StringType(), True),StructField("FIELDNAME_3", StringType(), True)]
>>> schema = StructType([])
>>>
>>> dataframeempty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
>>> resultunion = sqlContext.createDataFrame(sc.emptyRDD(), schema)
>>> if (fulldataframe.isEmpty()):
...     resultunion = dataframeempty
... elif (dataframeempty.isEmpty()):
...     resultunion = fulldataframe
... else:
...     resultunion=fulldataframe.union(dataframeempty)
...


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'isEmpty'
>>>

>>从pyspark.sql.types导入*
>>>fulldataframe=[StructField（“FIELDNAME_1”，StringType（），True），StructField（“FIELDNAME_2”，StringType（），True），StructField（“FIELDNAME_3”，StringType（），True）]
>>>schema=StructType（[]）
>>>
>>>dataframeempty=sqlContext.createDataFrame（sc.emptyRDD（），架构）
>>>ResultionOn=sqlContext.createDataFrame（sc.emptyRDD（），架构）
>>>如果（fulldataframe.isEmpty（））：
...     resultion=dataframeempty
... elif（dataframeempty.isEmpty（））：
...     resultion=fulldataframe
... 其他：
...     ResultionOn=fulldataframe.union（dataframeempty）
...
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
AttributeError:“list”对象没有属性“isEmpty”
>>>

有人能告诉我哪里出了故障吗？

计数可能需要很长时间。在Scala中：

dataframe.rdd.isEmpty()