Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 按groupby一天数据将列中的值从任意数字更改为10时如何计数_Python 3.x_Pandas_Numpy_Pandas Groupby - Fatal编程技术网

Python 3.x 按groupby一天数据将列中的值从任意数字更改为10时如何计数

Python 3.x 按groupby一天数据将列中的值从任意数字更改为10时如何计数,python-3.x,pandas,numpy,pandas-groupby,Python 3.x,Pandas,Numpy,Pandas Groupby,我有一个数据框,有三列,时间,A和标志 首先在天内将groupby函数应用于组,然后在列标志中检查将多少次更改为10以及保留10的时间 输入: Time flag 0 2019-02-14 00:00:10 1 1 2019-02-14 00:00:16 3 2 2019-02-14 00:00:21 4 3 2019-02-14 00:00:27 10 4 2019-02-14 00:00:32 10 5 2

我有一个数据框,有三列,时间,A和标志

  • 首先在天内将groupby函数应用于组,然后在列标志中检查将多少次更改为10以及保留10的时间
  • 输入:

                      Time  flag
    0  2019-02-14 00:00:10     1
    1  2019-02-14 00:00:16     3
    2  2019-02-14 00:00:21     4
    3  2019-02-14 00:00:27    10
    4  2019-02-14 00:00:32    10
    5  2019-02-15 00:00:37     1
    6  2019-02-15 00:00:43     0
    7  2019-02-15 00:00:48    10
    8  2019-02-15 00:00:54    10
    9  2019-02-15 00:00:59    10
    
    输出:

        group_start_time 1   group_end_time   count_change_to_10    minimum_duration_of_each_group_value_remains_10    Maximum_duration_of_each_group_value_remains_10
    
    2019-02-14 00:00:10    2019-02-14 00:00:32       1              2              2              2 
    
    2019-02-15 00:00:37    2019-02-15 00:00:59       1              3               3              3
    
    我相信您需要使用命名聚合:

    df['Time'] = pd.to_datetime(df['Time'])
    
    m = df['flag'].eq(10)
    
    g = m.ne(m.shift()).cumsum()[m]
    df['count'] = g.map(g.value_counts())
    
    
    df = df.groupby(df['Time'].dt.date).agg(group_start_time_1=('Time','first'),
                                            group_end_time_1=('Time','last'),
                                            count_change_to_10 =('count','nunique'),
                                            minimum_duration_of_each_group_value_remains_10=('count', 'min'),
                                            Maximum_duration_of_each_group_value_remains_10=('count', 'max'))
    print (df)
                group_start_time_1    group_end_time_1  count_change_to_10  \
    Time                                                                     
    2019-02-14 2019-02-14 00:00:10 2019-02-14 00:00:32                   1   
    2019-02-15 2019-02-15 00:00:37 2019-02-15 00:00:59                   1   
    
                minimum_duration_of_each_group_value_remains_10  \
    Time                                                          
    2019-02-14                                              2.0   
    2019-02-15                                              3.0   
    
                Maximum_duration_of_each_group_value_remains_10  
    Time                                                         
    2019-02-14                                              2.0  
    2019-02-15                                              3.0  
    

    编辑:熊猫的解决方案
    计数\u更改\u为\u 10
    每个\u组的持续时间\u值\u保持\u 10
    之间有什么区别?是否可能更改每行不同计数的数据?@jezrael,抱歉,我修改了一个小错误,计数更改为10表示值更改为10的次数,在我的数据中是1次更改
    df['Time'] = pd.to_datetime(df['Time'])
    
    m = df['flag'].eq(10)
    #consecutive groups only by mask
    g = m.ne(m.shift()).cumsum()[m]
    #counter only matched values by mask
    df['count'] = g.map(g.value_counts())
    
    df = df.groupby(df['Time'].dt.date).agg({'Time':['first','last'],
                                             'count':['nunique','min','max']})
    df.columns = df.columns.map('_'.join)
    
    d = {'Time_first':'group_start_time_1',
         'Time_last':'group_end_time_1',
         'count_nunique':'count_change_to_10',
         'count_min':'minimum_duration_of_each_group_value_remains_10',
         'count_max':'Maximum_duration_of_each_group_value_remains_10'}
    
    cols = ['Maximum_duration_of_each_group_value_remains_10',
            'Maximum_duration_of_each_group_value_remains_10']
    df = df.rename(columns=d)
    df[cols] = df[cols].astype(int)
    df = df.reset_index()
    
    print (df)
             Time  group_start_time_1    group_end_time_1  count_change_to_10  \
    0  2019-02-14 2019-02-14 00:00:10 2019-02-14 00:00:32                   1   
    1  2019-02-15 2019-02-15 00:00:37 2019-02-15 00:00:59                   1   
    
       minimum_duration_of_each_group_value_remains_10  \
    0                                              2.0   
    1                                              3.0   
    
       Maximum_duration_of_each_group_value_remains_10  
    0                                                2  
    1                                                3