将带有日期列的pyspark DataFrame转换为Pandas会导致AttributeError

将带有日期列的pyspark DataFrame转换为Pandas会导致AttributeError,pandas,dataframe,pyspark,pyspark-sql,Pandas,Dataframe,Pyspark,Pyspark Sql,我有以下数据帧(pyspark)- 尝试将数据帧转换为pandas- res2 = res.toPandas() 我遇到以下错误-AttributeError:只能使用带有datetimelike值的.dt访问器 详细错误- AttributeError Traceback (most recent call last) <ipython-input-29-471067d510fa> in <module>

我有以下数据帧(pyspark)-

尝试将数据帧转换为
pandas
-

res2 = res.toPandas()
我遇到以下错误-
AttributeError:只能使用带有datetimelike值的.dt访问器

详细错误-

    AttributeError                            Traceback (most recent call last)
<ipython-input-29-471067d510fa> in <module>
----> 1 res2 = res.toPandas()

/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/dataframe.py in toPandas(self)
   2123                         table = pyarrow.Table.from_batches(batches)
   2124                         pdf = table.to_pandas()
-> 2125                         pdf = _check_dataframe_convert_date(pdf, self.schema)
   2126                         return _check_dataframe_localize_timestamps(pdf, timezone)
   2127                     else:

/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/types.py in _check_dataframe_convert_date(pdf, schema)
   1705     """
   1706     for field in schema:
-> 1707         pdf[field.name] = _check_series_convert_date(pdf[field.name], field.dataType)
   1708     return pdf
   1709 

/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/types.py in _check_series_convert_date(series, data_type)
   1690     """
   1691     if type(data_type) == DateType:
-> 1692         return series.dt.date
   1693     else:
   1694         return series

/opt/anaconda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5061         if (name in self._internal_names_set or name in self._metadata or
   5062                 name in self._accessors):
-> 5063             return object.__getattribute__(self, name)
   5064         else:
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):

/opt/anaconda/lib/python3.7/site-packages/pandas/core/accessor.py in __get__(self, obj, cls)
    169             # we're accessing the attribute of the class, i.e., Dataset.geo
    170             return self._accessor
--> 171         accessor_obj = self._accessor(obj)
    172         # Replace the property with the accessor object. Inspired by:
    173         # http://www.pydanny.com/cached-property.html

/opt/anaconda/lib/python3.7/site-packages/pandas/core/indexes/accessors.py in __new__(cls, data)
    322             pass  # we raise an attribute error anyway
    323 
--> 324         raise AttributeError("Can only use .dt accessor with datetimelike "
    325                              "values")

AttributeError: Can only use .dt accessor with datetimelike values
AttributeError回溯(最近一次调用)
在里面
---->1 res2=res.toPandas()
/toPandas中的opt/anaconda/lib/python3.7/site-packages/pyspark/sql/dataframe.py(self)
2123 table=pyarrow.table.from_批次(批次)
2124 pdf=表至
->2125 pdf=_check_dataframe_convert_date(pdf,self.schema)
2126返回检查数据帧本地化时间戳(pdf,时区)
2127其他:
/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/types.py in\u check\u dataframe\u convert\u date(pdf,模式)
1705     """
1706对于架构中的字段:
->1707 pdf[field.name]=\u check\u series\u convert\u date(pdf[field.name],field.dataType)
1708返回pdf
1709
/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/types.py in\u check\u series\u convert\u date(series,data\u type)
1690     """
1691如果类型(数据类型)=日期类型:
->1692返回序列.dt.date
1693其他:
1694回归系列
/opt/anaconda/lib/python3.7/site-packages/pandas/core/generic.py in_u___getattr__(self,name)
5061如果(名称在self.\u内部名称\u集合或名称在self.\u元数据中)或
5062自我访问器中的名称):
->5063返回对象。\uuuu getattribute\uuuu(self,name)
5064其他:
5065如果自信息轴可以保存标识符,并且保存名称(名称):
/opt/anaconda/lib/python3.7/site-packages/pandas/core/accessor.py in\uuuuuuu get\uuuuuu(self、obj、cls)
169#我们正在访问类的属性,即Dataset.geo
170返回自存取器
-->171存取器(obj)=自存取器(obj)
172#将属性替换为访问器对象。灵感来自:
173         # http://www.pydanny.com/cached-property.html
/opt/anaconda/lib/python3.7/site-packages/pandas/core/indexes/accessors.py in_u___________(cls,数据)
322通过#我们仍然会引发属性错误
323
-->324 raise AttributeError(“只能使用带有datetimelike的.dt访问器”
325“价值观”)
AttributeError:只能对datetimelike值使用.dt访问器

有办法解决吗?也许转换原始数据DATAFRME?< /P> < P>作为一种解决方案,您可以考虑将<代码>日期>代码>列转换为<代码>时间戳< /代码>(这与熊猫的<代码>日期时间> /代码>类型更为一致。


你能打印
res.show()
的输出吗?@cs95将输出添加到原始帖子中。谢谢嗯,我看不出到底是什么问题。也许您可以尝试将日期列转换为时间戳,然后重试:
从pyspark.sql.functions导入到_timestamp;res2=res.withColumn('DATE',to_timestamp(res.DATE,'yyyyy-MM-dd'))。toPandas()
@cs95成功了!谢谢(我可以接受这个答案)。
    AttributeError                            Traceback (most recent call last)
<ipython-input-29-471067d510fa> in <module>
----> 1 res2 = res.toPandas()

/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/dataframe.py in toPandas(self)
   2123                         table = pyarrow.Table.from_batches(batches)
   2124                         pdf = table.to_pandas()
-> 2125                         pdf = _check_dataframe_convert_date(pdf, self.schema)
   2126                         return _check_dataframe_localize_timestamps(pdf, timezone)
   2127                     else:

/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/types.py in _check_dataframe_convert_date(pdf, schema)
   1705     """
   1706     for field in schema:
-> 1707         pdf[field.name] = _check_series_convert_date(pdf[field.name], field.dataType)
   1708     return pdf
   1709 

/opt/anaconda/lib/python3.7/site-packages/pyspark/sql/types.py in _check_series_convert_date(series, data_type)
   1690     """
   1691     if type(data_type) == DateType:
-> 1692         return series.dt.date
   1693     else:
   1694         return series

/opt/anaconda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5061         if (name in self._internal_names_set or name in self._metadata or
   5062                 name in self._accessors):
-> 5063             return object.__getattribute__(self, name)
   5064         else:
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):

/opt/anaconda/lib/python3.7/site-packages/pandas/core/accessor.py in __get__(self, obj, cls)
    169             # we're accessing the attribute of the class, i.e., Dataset.geo
    170             return self._accessor
--> 171         accessor_obj = self._accessor(obj)
    172         # Replace the property with the accessor object. Inspired by:
    173         # http://www.pydanny.com/cached-property.html

/opt/anaconda/lib/python3.7/site-packages/pandas/core/indexes/accessors.py in __new__(cls, data)
    322             pass  # we raise an attribute error anyway
    323 
--> 324         raise AttributeError("Can only use .dt accessor with datetimelike "
    325                              "values")

AttributeError: Can only use .dt accessor with datetimelike values
from pyspark.sql.functions import to_timestamp
res2 = res.withColumn('DATE', to_timestamp(res.DATE, 'yyyy-MM-dd')).toPandas()