Python 熊猫'；数据帧'；对象没有属性'；写'；当试图在拼花文件中本地保存它时_Python_Pandas_Parquet

Python 熊猫'；数据帧'；对象没有属性'；写'；当试图在拼花文件中本地保存它时

python pandas

Python 熊猫'；数据帧'；对象没有属性'；写'；当试图在拼花文件中本地保存它时,python,pandas,parquet,Python,Pandas,Parquet,你能帮我解决这个错误吗我正在尝试执行以下代码以将数据帧本地保存为分区拼花地板文件： dfcitas.write.format("parquet")\ .mode("overwrite")\ .partitionBy("NPatente")\ .save("/datos/dfcitas-particionado.parquet") 这个错误出现了： AttributeErrorTraceba

你能帮我解决这个错误吗

我正在尝试执行以下代码以将数据帧本地保存为分区拼花地板文件：

dfcitas.write.format("parquet")\
    .mode("overwrite")\
    .partitionBy("NPatente")\
    .save("/datos/dfcitas-particionado.parquet")

这个错误出现了：

AttributeErrorTraceback (most recent call last)
<ipython-input-164-9b06b4833bf9> in <module>()
----> 1 dfcitas.write.format("parquet")    .mode("overwrite")    .partitionBy("NPatente")    .save("/datos/dfcitas-particionado.parquet")
    
    /opt/conda/lib/python2.7/site-packages/pandas/core/generic.pyc in __getattr__(self, name)
       3612             if name in self._info_axis:
       3613                 return self[name]
    -> 3614             return object.__getattribute__(self, name)
       3615 
       3616     def __setattr__(self, name, value):
    
    AttributeError: 'DataFrame' object has no attribute 'write'

这里非常感谢你的帮助

谢谢大家!

请检查此处，将熊猫数据帧写入

请勾选此处，将熊猫数据帧写入

您可以添加创建

dfcitas

dataframe的步骤吗？将.save（）替换为to\u parquet（）@Shu，我确实在问题的末尾添加了创建dfcitas dataframe的代码。@在\u c\u级别以上，我将.save（）替换为to\u parquet（）同样的错误仍然出现。这里有一个pandas dataframe对象并尝试执行pyspark dataframe操作。因此，您应该调用

dfcitas.to_parquet（'df.parquet.gzip'，partition_cols=[“npante”]，compression='gzip'）

您可以添加创建

dfcitas

dataframe的步骤吗？将.save（）替换为to_parquet（）@Shu，我确实在问题的末尾添加了创建dfcitas数据框的代码。@c_以上级别，我替换了.save（）对于to_parquet（），仍然出现相同的错误。这里有一个pandas dataframe对象，并尝试执行pyspark dataframe操作。因此，您应该调用

dfcitas.to_parquet（'df.parquet.gzip'，partition_cols=[“npante”]，compression='gzip'）

# Importing the Pandas library
import pandas as pd

# Reading the cite75_99.txt.bz2 file in compressed format and sabing it to a DataFrame
df = pd.read_csv('/datos/cite75_99.txt.bz2', compression='bz2', header=0, sep=',', quotechar='"')

# Grouping the number of patents and counting the numbre of cites that each patent recieved in a new DataFrame
dfcitas = df.groupby('CITING')['CITED'].count().reset_index()

# Renaming the columns of the new DataFrame
dfcitas.columns = ['NPatente','ncitas']

# Printng the DataFrame
print(dfcitas)

df.to_parquet('df.parquet.gzip',
              compression='gzip')