Python 3.x 按groupby一天数据将列中的值从任意数字更改为10时如何计数
我有一个数据框,有三列,时间,A和标志Python 3.x 按groupby一天数据将列中的值从任意数字更改为10时如何计数,python-3.x,pandas,numpy,pandas-groupby,Python 3.x,Pandas,Numpy,Pandas Groupby,我有一个数据框,有三列,时间,A和标志 首先在天内将groupby函数应用于组,然后在列标志中检查将多少次更改为10以及保留10的时间 输入: Time flag 0 2019-02-14 00:00:10 1 1 2019-02-14 00:00:16 3 2 2019-02-14 00:00:21 4 3 2019-02-14 00:00:27 10 4 2019-02-14 00:00:32 10 5 2
Time flag
0 2019-02-14 00:00:10 1
1 2019-02-14 00:00:16 3
2 2019-02-14 00:00:21 4
3 2019-02-14 00:00:27 10
4 2019-02-14 00:00:32 10
5 2019-02-15 00:00:37 1
6 2019-02-15 00:00:43 0
7 2019-02-15 00:00:48 10
8 2019-02-15 00:00:54 10
9 2019-02-15 00:00:59 10
输出:
group_start_time 1 group_end_time count_change_to_10 minimum_duration_of_each_group_value_remains_10 Maximum_duration_of_each_group_value_remains_10
2019-02-14 00:00:10 2019-02-14 00:00:32 1 2 2 2
2019-02-15 00:00:37 2019-02-15 00:00:59 1 3 3 3
我相信您需要使用命名聚合:
df['Time'] = pd.to_datetime(df['Time'])
m = df['flag'].eq(10)
g = m.ne(m.shift()).cumsum()[m]
df['count'] = g.map(g.value_counts())
df = df.groupby(df['Time'].dt.date).agg(group_start_time_1=('Time','first'),
group_end_time_1=('Time','last'),
count_change_to_10 =('count','nunique'),
minimum_duration_of_each_group_value_remains_10=('count', 'min'),
Maximum_duration_of_each_group_value_remains_10=('count', 'max'))
print (df)
group_start_time_1 group_end_time_1 count_change_to_10 \
Time
2019-02-14 2019-02-14 00:00:10 2019-02-14 00:00:32 1
2019-02-15 2019-02-15 00:00:37 2019-02-15 00:00:59 1
minimum_duration_of_each_group_value_remains_10 \
Time
2019-02-14 2.0
2019-02-15 3.0
Maximum_duration_of_each_group_value_remains_10
Time
2019-02-14 2.0
2019-02-15 3.0
编辑:熊猫的解决方案
计数\u更改\u为\u 10
和每个\u组的持续时间\u值\u保持\u 10
之间有什么区别?是否可能更改每行不同计数的数据?@jezrael,抱歉,我修改了一个小错误,计数更改为10表示值更改为10的次数,在我的数据中是1次更改
df['Time'] = pd.to_datetime(df['Time'])
m = df['flag'].eq(10)
#consecutive groups only by mask
g = m.ne(m.shift()).cumsum()[m]
#counter only matched values by mask
df['count'] = g.map(g.value_counts())
df = df.groupby(df['Time'].dt.date).agg({'Time':['first','last'],
'count':['nunique','min','max']})
df.columns = df.columns.map('_'.join)
d = {'Time_first':'group_start_time_1',
'Time_last':'group_end_time_1',
'count_nunique':'count_change_to_10',
'count_min':'minimum_duration_of_each_group_value_remains_10',
'count_max':'Maximum_duration_of_each_group_value_remains_10'}
cols = ['Maximum_duration_of_each_group_value_remains_10',
'Maximum_duration_of_each_group_value_remains_10']
df = df.rename(columns=d)
df[cols] = df[cols].astype(int)
df = df.reset_index()
print (df)
Time group_start_time_1 group_end_time_1 count_change_to_10 \
0 2019-02-14 2019-02-14 00:00:10 2019-02-14 00:00:32 1
1 2019-02-15 2019-02-15 00:00:37 2019-02-15 00:00:59 1
minimum_duration_of_each_group_value_remains_10 \
0 2.0
1 3.0
Maximum_duration_of_each_group_value_remains_10
0 2
1 3