Python Can';在dataframe中使用group by函数时找不到列名
我有一个熊猫数据框,看起来像这样:Python Can';在dataframe中使用group by函数时找不到列名,python,pandas,dataframe,group-by,pandas-groupby,Python,Pandas,Dataframe,Group By,Pandas Groupby,我有一个熊猫数据框,看起来像这样: Country City POI Type 0 NL Amsterdam KFC restaurant 1 NL Amsterdam KFC cafe 2 NL Arnhem McDonalds fast food 3 NL Arnhem McDonalds ice cream Country
Country City POI Type
0 NL Amsterdam KFC restaurant
1 NL Amsterdam KFC cafe
2 NL Arnhem McDonalds fast food
3 NL Arnhem McDonalds ice cream
Country City POI Type
0 NL Amsterdam KFC restaurant, cafe
1 NL Arnhem McDonalds fast food, ice cream
我需要按类型列分组,以便在所有其他列中不存在重复项。换句话说,我需要这样的输出:
Country City POI Type
0 NL Amsterdam KFC restaurant
1 NL Amsterdam KFC cafe
2 NL Arnhem McDonalds fast food
3 NL Arnhem McDonalds ice cream
Country City POI Type
0 NL Amsterdam KFC restaurant, cafe
1 NL Arnhem McDonalds fast food, ice cream
我尝试使用GROUPBY函数,但所有列名都消失了,而shape函数显示0列。也许有更好的方法来分组这些价值观
下面是一个示例代码:
import pandas as pd
import numpy as np
data = np.array([['','Country','City', 'POI', 'Type'],
[0,"NL","Amsterdam", 'KFC', 'cafe'],
[1,"NL","Amsterdam", 'KFC', 'restaurant'],
[2,"NL","Arnhem", 'McDonalds', 'fast-food'],
[3,"NL","Arnhem", 'McDonalds', 'ice cream']]
)
initial_df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
final_df = initial_df .groupby( [ "Country", "City", "POI", "Type"] ).count()
print(list(final_df.columns.values))
print(final_df.shape)
您可以分组到
str.join
:
res = df.groupby(['Country', 'City', 'POI'])['Type'].apply(', '.join).reset_index()
print(res)
Country City POI Type
0 NL Amsterdam KFC restaurant, cafe
1 NL Arnhem McDonalds fastfood, icecream
您的
final_df
为空,因为您要求pandas
按所有列分组。
如果您只想按列进行分组“Type”
,请执行以下操作:
grouped = initial_df .groupby( ["Type"] )
然后将count()
函数应用于分组的数据帧。这将统计每个组的每个列中非nan
元素的实例。
不过,您要做的是访问每个组。
您可以这样做:
for name, group in grouped:
print(name) # this prints the Type of your group
print(group) # this prints the dataframe corrisponging to your Type
希望这有帮助。谢谢jpp,这很有效。