Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按不同级别的数据汇总各组之间的差异_Python_Python 3.x_Pandas_Pandas Groupby - Fatal编程技术网

Python 按不同级别的数据汇总各组之间的差异

Python 按不同级别的数据汇总各组之间的差异,python,python-3.x,pandas,pandas-groupby,Python,Python 3.x,Pandas,Pandas Groupby,这是我的数据帧 data = [[1,'A','a','2020-01-01'], [1,'A','b','2020-01-02'], [1,'B','a','2020-01-03'], [2,'A','a','2020-01-04'], [2,'A','b','2020-01-05'], [2,'A','b','2020-01-06']] df_1 = pd.DataFrame(data = data,columns = ['id','main','s

这是我的数据帧

data = [[1,'A','a','2020-01-01'],
    [1,'A','b','2020-01-02'],
    [1,'B','a','2020-01-03'],
    [2,'A','a','2020-01-04'],
    [2,'A','b','2020-01-05'],
    [2,'A','b','2020-01-06']]

df_1 = pd.DataFrame(data = data,columns = ['id','main','sub_steps','date'])
df_1['date'] = pd.to_datetime(df_1['date'])
我想按
id
列进行分组,并计算
Main
sub\u步骤更改时的时间差

期望结果

   id   main sub_steps       date sub_steps date_main_diff date_subStep_diff
0   1    A           a 2020-01-01    [a, b]         0 days            0 days
1   1    A           b 2020-01-02    [a, b]         1 days            0 days
2   1    B           a 2020-01-03       [a]         0 days            0 days
3   2    A           a 2020-01-04 [a, b, b]         0 days            0 days
4   2    A           b 2020-01-05 [a, b, b]         1 days            0 days
5   2    A           b 2020-01-06 [a, b, b]         2 days            1 days
我只能想出

(df_1.merge(df_1.groupby(['id','Main'])
            .agg({'sub_steps':list,
                'date': df_1.date - df_1.date.shift(1) })
            ,on=['id', 'Main']))
它给出了一个错误
TypeError:“NaTType”对象不可调用


日期差异列的唯一问题在于我得到了我想要的结果。

我们只能使用
transform
diff

df['sub_steps1']=df.groupby(['id','main'])['sub_steps'].transform(lambda x : [x.tolist()]*len(x))
 df['date_main_diff']=df.groupby(['id','main']).date.diff().fillna(pd.Timedelta('0 days'))
df['date_main_diff']=df.groupby(['id','main']).date_main_diff.apply(lambda x : x.cumsum())
df['date_subStep_diff']=df.groupby(['id','main','sub_steps']).date.diff().fillna(pd.Timedelta('0 days'))
df['date_subStep_diff']=df.groupby(['id','main','sub_steps']).date_subStep_diff.apply(lambda x : x.cumsum())
df
       id main sub_steps       date sub_steps1 date_main_diff date_subStep_diff
    0   1    A         a 2020-01-01     [a, b]         0 days            0 days
    1   1    A         b 2020-01-02     [a, b]         1 days            0 days
    2   1    B         a 2020-01-03        [a]         0 days            0 days
    3   2    A         a 2020-01-04  [a, b, b]         0 days            0 days
    4   2    A         b 2020-01-05  [a, b, b]         1 days            0 days
    5   2    A         b 2020-01-06  [a, b, b]         2 days            1 days