Pandas 分组所有结果而不诉诸_Pandas

Pandas 分组所有结果而不诉诸

pandas

Pandas 分组所有结果而不诉诸,pandas,Pandas,groupby中的排序不像我想象的那样有效。在下面的示例中，我不想将“USA”分组在一起，因为有一行“Russia” 当我使用groupby时，我得到以下信息： df.groupby('country', sort=False).size() country india 1 USA 3 Russia 1 dtype: int64 我能得到这样的结果吗 country india 1 USA 1 Russia 1 USA 2

groupby中的排序不像我想象的那样有效。在下面的示例中，我不想将“USA”分组在一起，因为有一行“Russia”

当我使用groupby时，我得到以下信息：

df.groupby('country', sort=False).size()

country
india     1
USA       3
Russia    1
dtype: int64

我能得到这样的结果吗

country
india     1
USA       1
Russia    1
USA       2

您可以尝试以下代码，而不是直接使用groupby：

country = [] #initialising lists
count = []
for i, g in df.groupby([(df.country != df.country.shift()).cumsum()]): #Creating a list that increases by 1 for every time a unique value appears in the dataframe country column.
    country.append(g.country.tolist()[0]) #Adding the name of country to list.
    count.append(len(g.country.tolist())) #Adding the number of times that country appears to list.

pd.DataFrame(data = {'country': country, 'count':count}) #Binding the lists all into a dataframe.

此

df.groupby（[（df.country！=df.country.shift（））.cumsum（）]）

创建一个数据框，为国家/地区列中的每个国家/地区更改提供一个唯一的数字（累计）

在for循环中，

表示分配给每个国家/地区外观的唯一累积编号，

表示原始数据框中相应的整行

g.country.tolist（）

输出每个独特外观（aka

）的国家名称列表，即

对于给定的数据

因此，第一项是国家名称，长度代表出场次数。然后，可以将这些信息（记录在列表中，然后）放在一个数据帧中，并提供所需的输出

您还可以使用列表理解而不是for循环：

cumulative_df = df.groupby([(df.country != df.country.shift()).cumsum()]) #The cumulative count dataframe
country = [g.country.tolist()[0]  for i,g in  cumulative_df] #List comprehension for getting country names.
count = [len(g.country.tolist())  for i,g in  cumulative_df] #List comprehension for getting count for each country.

参考资料：