Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/pandas/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 熊猫轴心期_Pandas_Date_Pivot - Fatal编程技术网

Pandas 熊猫轴心期

Pandas 熊猫轴心期,pandas,date,pivot,Pandas,Date,Pivot,我有以下建议: user_id step date 1 start 2018-04-17 15:27:07 1 step1 2018-04-17 15:28:07 1 end 2018-04-17 15:29:07 2 start 2018-05-17 15:28:07 2 step1 2018-05-17 15:29:07 2 end 2018

我有以下建议:

user_id    step    date
1         start    2018-04-17 15:27:07
1         step1    2018-04-17 15:28:07
1         end      2018-04-17 15:29:07
2         start    2018-05-17 15:28:07
2         step1    2018-05-17 15:29:07
2         end      2018-05-17 15:30:07
我需要将其转换为下表:

user_id   start                  end                   time (end-start)
1         2018-04-17 15:27:07    2018-04-17 15:29:07   2
2         2018-05-17 15:28:07    2018-05-17 15:30:07   2

我被困在这一点上,非常感谢任何帮助

您可以旋转并找到时间增量

new_df = df.pivot('user_id', 'step', 'date').drop('step1', 1).reset_index()
new_df.columns.name = None
new_df['time (end-start)'] = (new_df['end'] - new_df['start']).astype('timedelta64[m]')


    user_id end                 start               time (end-start)
0   1       2018-04-17 15:29:07 2018-04-17 15:27:07 2.0
1   2       2018-05-17 15:30:07 2018-05-17 15:28:07 2.0
编辑:对于具有重复条目的dataframe,如下所示:

    user_id step    date
0   1   start   2018-04-17 15:27:07
1   1   step1   2018-04-17 15:28:07
2   1   end     2018-04-17 15:29:07
3   1   end     2018-04-17 15:32:07
4   2   start   2018-05-17 15:26:07
5   2   start   2018-05-17 15:28:07
6   2   step1   2018-05-17 15:29:07
7   2   end     2018-05-17 15:30:07

new_df = df.pivot_table(index = 'user_id', columns = 'step', values = 'date', aggfunc = 'first').drop('step1', 1).reset_index() 

new_df.columns.name = None

new_df['time (end-start)'] = (new_df['end'] - new_df['start']).astype('timedelta64[m]')
你得到

    user_id end                 start               time (end-start)
0   1       2018-04-17 15:29:07 2018-04-17 15:27:07 2.0
1   2       2018-05-17 15:30:07 2018-05-17 15:26:07 4.0

非常感谢。不幸的是,它没有起作用。我有一个大的集合,并且有重复的值,例如用户1可能有两次结束日期,我们希望在开始和结束时都使用最早的日期。如果您有重复的值,则需要使用pivot_表,而不是带有适当aggfunc的pivot。