Python 按两列分组，并从第三列计算唯一值_Python_Pandas_Pandas Groupby

Python 按两列分组，并从第三列计算唯一值

python pandas

Python 按两列分组，并从第三列计算唯一值,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有以下df1： id period color size rate 1 01 red 12 30 1 02 red 12 30 2 01 blue 12 35 3 03 blue 12 35 4 01 blue 12 35 4 02 blue 12 35 5 01 pink 10 40 6 01 pink 10 40 我需要创建一个新的df2，其

我有以下df1：

id period color size rate
1    01    red   12   30
1    02    red   12   30
2    01    blue  12   35
3    03    blue  12   35
4    01    blue  12   35
4    02    blue  12   35
5    01    pink  10   40
6    01    pink  10   40

我需要创建一个新的df2，其索引是3列颜色大小比率的集合，然后按“period”分组并获得唯一ID的计数。我的最终df应具有以下结构：

index       period   count
red-12-30    01        1
red-12-30    02        1
blue-12-35   01        2
blue-12-35   03        1
blue-12-35   02        1
pink-10-40   01        2

提前感谢您的帮助。

请尝试

.agg（'-'.join）

和

.groupby

df1 =  df.groupby([df[["color", "size", "rate"]].astype(str)\
            .agg("-".join, 1).rename('index'), "period"])\
                .agg(count=("id", "nunique"))\
                .reset_index()
               
print(df1)

        index  period  count
0  blue-12-35       1      2
1  blue-12-35       2      1
2  blue-12-35       3      1
3  pink-10-40       1      2
4   red-12-30       1      1
5   red-12-30       2      1

 df2 = df1.groupby(['color', 'size', 'rate', 'period']).count().reset_index();
 df2['index'] = df2.apply(lambda x: '-'.join([x['color'], x['size'], x['rate']]), axis = 1)

您可以通过

groupby

df1 =  df.groupby([df[["color", "size", "rate"]].astype(str)\
            .agg("-".join, 1).rename('index'), "period"])\
                .agg(count=("id", "nunique"))\
                .reset_index()
               
print(df1)

        index  period  count
0  blue-12-35       1      2
1  blue-12-35       2      1
2  blue-12-35       3      1
3  pink-10-40       1      2
4   red-12-30       1      1
5   red-12-30       2      1

 df2 = df1.groupby(['color', 'size', 'rate', 'period']).count().reset_index();
 df2['index'] = df2.apply(lambda x: '-'.join([x['color'], x['size'], x['rate']]), axis = 1)

感谢马纳金的快速帮助。我需要最终df的确切结构，因为我将进一步使用它来创建pivot_表。有什么指针吗？只需要

reset\u index

@savi@anky谢谢：）我忘了你可以直接重命名一个系列的索引