Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中使用时间戳在数据帧中进行逐小时计算?_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

如何在python中使用时间戳在数据帧中进行逐小时计算?

如何在python中使用时间戳在数据帧中进行逐小时计算?,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有一个这样的数据框 Account timestamp no_of_transactions transaction_value A 2016-07-26 13:43:29 2 50 B 2016-07-27 14:44:29 3 40 A 2016-07-26 13:33:29

我有一个这样的数据框

Account     timestamp          no_of_transactions      transaction_value
   A     2016-07-26 13:43:29           2                      50
   B     2016-07-27 14:44:29           3                      40
   A     2016-07-26 13:33:29           1                      15
   A     2016-07-27 13:56:29           4                      30
   B     2016-07-26 14:33:29           7                      80
   C     2016-07-27 13:23:29           5                      10
   C     2016-07-27 13:06:29           3                      10
   A     2016-07-26 14:43:29           4                      22
   B     2016-07-27 13:43:29           1                      11
我想计算每个账户每小时的无交易数量和交易价值的速度

例如,账户A的最小时间戳为2016-07-26 13:33:29。我要的是2016-07-26 13:33:29到2016-07-26 14:33:29的假金额。然后找到下一个可用的分钟时间戳,在本例中为2016-07-26 14:43:29,并计算相同的形式,直到下一个1小时。每个帐户都是这样

在获得1小时窗口中的总和后,如何分配实际数据帧中的值,例如,添加两个新列后的实际df如下所示

A        2016-07-26 13:43:29      2          50            5            116
B        2016-07-27 14:44:29      3          40            3            40 
A        2016-07-26 13:33:29      1          15            5            116
A        2016-07-27 13:56:29      4          30            4            30
只需将总值与时间戳所在的实际值相加

如何以高效的方式执行,这不会花费很长时间执行

我认为需要使用聚合
求和
和:

另一种具有小时精度的解决方案:

df1 = df.groupby(['Account', df['timestamp'].dt.floor('h')]).sum().reset_index()
print (df1)
  Account           timestamp  no_of_transactions  transaction_value
0       A 2016-07-26 13:00:00                   3                 65
1       A 2016-07-26 14:00:00                   4                 22
2       A 2016-07-27 13:00:00                   4                 30
3       B 2016-07-26 14:00:00                   7                 80
4       B 2016-07-27 13:00:00                   1                 11
5       B 2016-07-27 14:00:00                   3                 40
6       C 2016-07-27 13:00:00                   8                 20
按注释编辑:

df2 = df.groupby(['Account', pd.Grouper(key='timestamp', freq='2H')]).sum().reset_index()
print (df2)
  Account           timestamp  no_of_transactions  transaction_value
0       A 2016-07-26 12:00:00                   3                 65
1       A 2016-07-26 14:00:00                   4                 22
2       A 2016-07-27 12:00:00                   4                 30
3       B 2016-07-26 14:00:00                   7                 80
4       B 2016-07-27 12:00:00                   1                 11
5       B 2016-07-27 14:00:00                   3                 40
6       C 2016-07-27 12:00:00                   8                 20
编辑:

首先通过
日期时间创建数据帧

def f(x):
    #depends of data, maybe add last 1h is not necessary
    rng = pd.date_range(x.index.min(), x.index.max() + pd.Timedelta(1, unit='h'), freq='h')
    return pd.Series(rng)

df2 = (df.set_index('timestamp')
        .groupby('Account')
        .apply(f)
        .reset_index(level=1, drop=True)
        .reset_index(name='timestamp1'))
print (df2)
   Account          timestamp1
0        A 2016-07-26 13:33:29
1        A 2016-07-26 14:33:29
2        A 2016-07-26 15:33:29
3        A 2016-07-26 16:33:29
4        A 2016-07-26 17:33:29
5        A 2016-07-26 18:33:29
6        A 2016-07-26 19:33:29
7        A 2016-07-26 20:33:29
8        A 2016-07-26 21:33:29
9        A 2016-07-26 22:33:29
10       A 2016-07-26 23:33:29
11       A 2016-07-27 00:33:29
12       A 2016-07-27 01:33:29
13       A 2016-07-27 02:33:29
14       A 2016-07-27 03:33:29
15       A 2016-07-27 04:33:29
16       A 2016-07-27 05:33:29
17       A 2016-07-27 06:33:29
18       A 2016-07-27 07:33:29
19       A 2016-07-27 08:33:29
20       A 2016-07-27 09:33:29
21       A 2016-07-27 10:33:29
22       A 2016-07-27 11:33:29
23       A 2016-07-27 12:33:29
24       A 2016-07-27 13:33:29
25       A 2016-07-27 14:33:29
26       B 2016-07-26 14:33:29
...
...
然后通过以下方式添加到原始
df

合计
总和

df4 = df3.groupby(['Account','timestamp1'], as_index=False).sum()
print (df4)
  Account          timestamp1  no_of_transactions  transaction_value
0       A 2016-07-26 13:33:29                   3                 65
1       A 2016-07-26 14:33:29                   4                 22
2       A 2016-07-27 13:33:29                   4                 30
3       B 2016-07-26 14:33:29                   7                 80
4       B 2016-07-27 13:33:29                   1                 11
5       B 2016-07-27 14:33:29                   3                 40
6       C 2016-07-27 13:06:29                   8                 20
如果要将列添加到原始数据帧
DataFrame
中,请首先使用左连接到原始数据帧:

df5 = df3.join(df3.groupby(['Account','timestamp1']).transform('sum').add_prefix('sum_'))
print (df5)
  Account           timestamp  no_of_transactions  transaction_value  \
0       A 2016-07-26 13:33:29                   1                 15   
1       A 2016-07-26 13:43:29                   2                 50   
2       B 2016-07-26 14:33:29                   7                 80   
3       A 2016-07-26 14:43:29                   4                 22   
4       C 2016-07-27 13:06:29                   3                 10   
5       C 2016-07-27 13:23:29                   5                 10   
6       B 2016-07-27 13:43:29                   1                 11   
7       A 2016-07-27 13:56:29                   4                 30   
8       B 2016-07-27 14:44:29                   3                 40   

           timestamp1  sum_no_of_transactions  sum_transaction_value  
0 2016-07-26 13:33:29                       3                     65  
1 2016-07-26 13:33:29                       3                     65  
2 2016-07-26 14:33:29                       7                     80  
3 2016-07-26 14:33:29                       4                     22  
4 2016-07-27 13:06:29                       8                     20  
5 2016-07-27 13:06:29                       8                     20  
6 2016-07-27 13:33:29                       1                     11  
7 2016-07-27 13:33:29                       4                     30  
8 2016-07-27 14:33:29                       3                     40 


您需要使用字典为每个帐户保存单个条目预期输出是什么?
df4 = df3.groupby(['Account','timestamp1'], as_index=False).sum()
print (df4)
  Account          timestamp1  no_of_transactions  transaction_value
0       A 2016-07-26 13:33:29                   3                 65
1       A 2016-07-26 14:33:29                   4                 22
2       A 2016-07-27 13:33:29                   4                 30
3       B 2016-07-26 14:33:29                   7                 80
4       B 2016-07-27 13:33:29                   1                 11
5       B 2016-07-27 14:33:29                   3                 40
6       C 2016-07-27 13:06:29                   8                 20
df5 = df3.join(df3.groupby(['Account','timestamp1']).transform('sum').add_prefix('sum_'))
print (df5)
  Account           timestamp  no_of_transactions  transaction_value  \
0       A 2016-07-26 13:33:29                   1                 15   
1       A 2016-07-26 13:43:29                   2                 50   
2       B 2016-07-26 14:33:29                   7                 80   
3       A 2016-07-26 14:43:29                   4                 22   
4       C 2016-07-27 13:06:29                   3                 10   
5       C 2016-07-27 13:23:29                   5                 10   
6       B 2016-07-27 13:43:29                   1                 11   
7       A 2016-07-27 13:56:29                   4                 30   
8       B 2016-07-27 14:44:29                   3                 40   

           timestamp1  sum_no_of_transactions  sum_transaction_value  
0 2016-07-26 13:33:29                       3                     65  
1 2016-07-26 13:33:29                       3                     65  
2 2016-07-26 14:33:29                       7                     80  
3 2016-07-26 14:33:29                       4                     22  
4 2016-07-27 13:06:29                       8                     20  
5 2016-07-27 13:06:29                       8                     20  
6 2016-07-27 13:33:29                       1                     11  
7 2016-07-27 13:33:29                       4                     30  
8 2016-07-27 14:33:29                       3                     40 
cols = ['Account','timestamp','sum_no_of_transactions','sum_transaction_value']
df = df.merge(df5[cols], on=['Account','timestamp'], how='left')
print (df)
  Account           timestamp  no_of_transactions  transaction_value  \
0       A 2016-07-26 13:43:29                   2                 50   
1       B 2016-07-27 14:44:29                   3                 40   
2       A 2016-07-26 13:33:29                   1                 15   
3       A 2016-07-27 13:56:29                   4                 30   
4       B 2016-07-26 14:33:29                   7                 80   
5       C 2016-07-27 13:23:29                   5                 10   
6       C 2016-07-27 13:06:29                   3                 10   
7       A 2016-07-26 14:43:29                   4                 22   
8       B 2016-07-27 13:43:29                   1                 11   

   sum_no_of_transactions  sum_transaction_value  
0                       3                     65  
1                       3                     40  
2                       3                     65  
3                       4                     30  
4                       7                     80  
5                       8                     20  
6                       8                     20  
7                       4                     22  
8                       1                     11