Pandas 新的数据帧列'；计数'；对于每个ID且小于日期_Pandas_Dataframe_Count_Pandas Groupby

Pandas 新的数据帧列'；计数'；对于每个ID且小于日期

pandas dataframe

Pandas 新的数据帧列'；计数'；对于每个ID且小于日期,pandas,dataframe,count,pandas-groupby,Pandas,Dataframe,Count,Pandas Groupby,我想添加一个名为count的新列，它统计每个ID中少于日期的条目数。这就是我的数据框的外观 date ID count 20191101    1       1<br> 20191102    2       0<br> 20191030   &

我想添加一个名为count的新列，它统计每个ID中少于日期的条目数。这就是我的数据框的外观

    date    ID     count
20191101 &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; 1<br>
20191102 &nbsp; &nbsp;2 &nbsp; &nbsp; &nbsp; 0<br>
20191030 &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; 0<br>
20191103 &nbsp; &nbsp;2 &nbsp; &nbsp; &nbsp; 1<br>
20191105 &nbsp; &nbsp;2 &nbsp; &nbsp; &nbsp; 2<br>
20191030 &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; 0<br>

日期ID计数
20191101 1

20191102 2 0

20191030 1 0

20191103 2 1

20191105 2

20191030 1 0

我的数据帧有15列和90k行

IIUC，排序和求和是您所需要的

df = df.sort_values(by='date')
df1 = df.groupby(['ID', 'date'], as_index=False)['count'].sum()
df1['cumulative_count'] = df1.groupby('ID', as_index=False)['count'].cumsum()
df1
    ID  date    count   cumulative_count
0   1   20191030    0   0
1   1   20191101    1   1
2   2   20191102    0   0
3   2   20191103    1   1
4   2   20191105    2   3

IIUC、sort和cumsum是您需要的

df = df.sort_values(by='date')
df1 = df.groupby(['ID', 'date'], as_index=False)['count'].sum()
df1['cumulative_count'] = df1.groupby('ID', as_index=False)['count'].cumsum()
df1
    ID  date    count   cumulative_count
0   1   20191030    0   0
1   1   20191101    1   1
2   2   20191102    0   0
3   2   20191103    1   1
4   2   20191105    2   3

大概是这样的：

import pandas as pd
df =  df.sort_values(by=['ID','date'])
df['count'] = 1
df['cumsum'] = df.groupby('ID')['count'].transform('cumsum')
df['final'] = df['cumsum'] - 1 

       date  ID  count  cumsum  final
2  20191030   1      1       1      0
5  20191030   1      1       2      1
0  20191101   1      1       3      2
1  20191102   2      1       1      0
3  20191103   2      1       2      1
4  20191105   2      1       3      2

final

是您需要的列，其余的只是可以丢弃的辅助列

类似于以下内容：

import pandas as pd
df =  df.sort_values(by=['ID','date'])
df['count'] = 1
df['cumsum'] = df.groupby('ID')['count'].transform('cumsum')
df['final'] = df['cumsum'] - 1 

       date  ID  count  cumsum  final
2  20191030   1      1       1      0
5  20191030   1      1       2      1
0  20191101   1      1       3      2
1  20191102   2      1       1      0
3  20191103   2      1       2      1
4  20191105   2      1       3      2

final

是您需要的列，其余的只是可以丢弃的辅助列

您能否给出一个示例，说明在给定示例的情况下输出是什么样子？我的数据框有'date'和'ID'字段，我想添加'count'列您能否给出一个示例，说明在给定示例的情况下输出是什么样子？我的数据框有'date'和'ID'字段，我想添加“计数”列