Python 排序数据帧和删除重复项时出现顺序错误_Python_Pandas_Dataframe

Python 排序数据帧和删除重复项时出现顺序错误

python pandas dataframe

Python 排序数据帧和删除重复项时出现顺序错误,python,pandas,dataframe,Python,Pandas,Dataframe,我一直在使用以下代码： UsersFullUnique = UsersFullLoc UsersFullUnique.Placetype = UsersFullUnique.Placetype.astype('category', categories=['Continent', 'Country', 'State', 'County','Town','POI', 'Suburb', 'LocalAdmin', 'Isla

我一直在使用以下代码：

UsersFullUnique = UsersFullLoc
UsersFullUnique.Placetype = UsersFullUnique.Placetype.astype('category', 
                                    categories=['Continent', 'Country', 'State', 'County','Town','POI', 'Suburb', 'LocalAdmin', 'Island', 'Estate', 'Colloquial', 'HistoricalTown', 'HistoricalCounty', 'LandFeature', 'Supername'], 
                                    ordered=True)

UsersFullUnique = UsersFullUnique.sort_values('Placetype').groupby('ID', as_index=False).first()
UsersFullUnique.head(8)

要对以下数据帧进行排序：

ID          Unnamed: 0  WOE_ID  Locationname_x  Name_Type   Language_x  Username    Friends Followers   Status_count    Favorites   Account_age                     ISO Locationname    Language    Placetype   Parent_ID
100000045   3363940     2459115 New York City   V           ENG         UsersDude   35.0    10.0         0.0             0           Mon Dec 18 11:19:42 CST 2009   US  New York         ENG        Town        2347591

这给了我一个错误

TypeError: _astype() got an unexpected keyword argument 'ordered'

奇怪的是，我在另外两个有索引的数据集上使用了这段代码

Unnamed: 0  WOE_ID  Locationname_x  Name_Type   Language_x  ID  Username    Friends Followers   Status_count    Favorites   Account_age ISO Locationname_y  Language_y  Placetype   Parent_ID

及

它包含几乎相同类型的信息，并且没有错误

有人知道可能的解决方案吗？

这可能是一个类似于的错误。立即的解决方案是确保该列不是一个类别：

if UsersFullUnique.Placetype.dtype != 'category': 
    UsersFullUnique.Placetype = UsersFullUnique.Placetype.astype('category', 
                                    categories=[...], 
                                    ordered=True)

更普遍的问题是，使用

UsersFullUnique=UsersFullLoc

时，您不需要复制，只需为同一对象指定两个名称，因此对新数据帧所做的任何更改也将对旧数据帧进行更改

如果出于某种原因您需要一份副本，您应该使用：

UsersFullUnique = UsersFullLoc.copy(deep=True)

这可能是一个类似于的错误。立即的解决方案是确保该列不是一个类别：

if UsersFullUnique.Placetype.dtype != 'category': 
    UsersFullUnique.Placetype = UsersFullUnique.Placetype.astype('category', 
                                    categories=[...], 
                                    ordered=True)

更普遍的问题是，使用

UsersFullUnique=UsersFullLoc

时，您不需要复制，只需为同一对象指定两个名称，因此对新数据帧所做的任何更改也将对旧数据帧进行更改

如果出于某种原因您需要一份副本，您应该使用：

UsersFullUnique = UsersFullLoc.copy(deep=True)

您还可以在其他数据集上调用

.astype（'category'，…）

？是的。代码中唯一的区别是

UsersFullUnique=UsersFullLoc

中两个变量的名称以及每次提到的UsersFullUnique。这可能是一个错误吗？首先尝试将其重置为字符串，然后将其设置为类别。您还可以在其他数据集上调用

.astype（'category'，…）

？是的。代码中唯一的区别是

UsersFullUnique=UsersFullLoc

中两个变量的名称以及每次提到的UsersFullUnique。这可能是一个错误吗？首先尝试将其重置为字符串，然后将其设置为类别。抱歉，在您回答后，我已将代码投入工作，但完成（大型数据集）需要一段时间，并且在过去两天一直很忙。很有效，谢谢！很抱歉，在您回答后，我已将代码投入工作，但完成（大型数据集）需要一段时间，并且在过去两天一直很忙。很有效，谢谢！