Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/305.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 计算具有多个索引列的行之间的差异_Python_Pandas - Fatal编程技术网

Python 计算具有多个索引列的行之间的差异

Python 计算具有多个索引列的行之间的差异,python,pandas,Python,Pandas,我有一个dataframe,其中一列表示时间,其他列表示键的其他部分 df = pd.DataFrame(data=[(t, l1, l2, t * t * (1 + l2 + l1)) for t in range(3) for l1 in [3, 4] for l2 in [10, 100]], colum

我有一个dataframe,其中一列表示时间,其他列表示键的其他部分

df = pd.DataFrame(data=[(t, l1, l2, t * t * (1 + l2 + l1)) 
                        for t in range(3) 
                        for l1 in [3, 4] 
                        for l2 in [10, 100]], 
                  columns=['t', 'l1', 'l2', 'x'])

    t   l1  l2  x
0   0   3   10  0
1   0   3   100 0
2   0   4   10  0
3   0   4   100 0
4   1   3   10  14
5   1   3   100 104
6   1   4   10  15
7   1   4   100 105
8   2   3   10  56
9   2   3   100 416
10  2   4   10  60
11  2   4   100 420
我正在查找行的“x”列与前面的值“t”的差异,但是“l1”和“l2”的值相同

    t   l1  l2  x   t.1 delta_x
0   0   3   10  0   1   NaN
1   0   3   100 0   1   NaN
2   0   4   10  0   1   NaN
3   0   4   100 0   1   NaN
4   1   3   10  14  2   14.0
5   1   3   100 104 2   104.0
6   1   4   10  15  2   15.0
7   1   4   100 105 2   105.0
8   2   3   10  56  3   42.0
9   2   3   100 416 3   312.0
10  2   4   10  60  3   45.0
11  2   4   100 420 3   315.0
我可以用下面的代码生成这个框架

df['t.1'] = df.t + 1
df['delta_x'] = df.x - df.merge(df, left_on=['t', 'l1', 'l2'], 
                                right_on=['t.1', 'l1', 'l2'], 
                                how='left', 
                                suffixes=['','.1'])['x.1']
有更干净或更有效的方法吗?

试试:

def diff(x):
    return x - x.shift()

df['delta_x'] = df.groupby(['l1', 'l2'])['x'].apply(diff)
在我的回答中落实MAXU下面的评论。 您必须在
l1
l2
列上使用,因为您要根据
t
列的值的变化来比较这对值(
l1,l2
)的
x
列的差异

默认情况下,计算按
l1和l2
分组的(
t=1
)和(
t=0
)的值之间的差值,并返回结果。因此,如果您想找到(
t=2
)和(
t=0
)之间
x
值的差异,只需执行
diff(periods=2)

最后,使用该方法返回组块的每个组内计算的差异

In [3]: df['delta_x'] = df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff())

In [4]: df
Out[4]: 
    t  l1   l2    x  delta_x
0   0   3   10    0      NaN
1   0   3  100    0      NaN
2   0   4   10    0      NaN
3   0   4  100    0      NaN
4   1   3   10   14     14.0
5   1   3  100  104    104.0
6   1   4   10   15     15.0
7   1   4  100  105    105.0
8   2   3   10   28     14.0
9   2   3  100  208    104.0
10  2   4   10   30     15.0
11  2   4  100  210    105.0
时间限制:

In [5]: %timeit df['delta_x'] = df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff())
1000 loops, best of 3: 1.55 ms per loop

In [17]: %timeit df['delta_x'] = df.x - df.merge(df, left_on=['t', 'l1', 'l2'], right_on=['t.1', 'l1', 'l2'],how='left',suffixes=['','.1'])['x.1']
100 loops, best of 3: 3.33 ms per loop
我的第一个想法是

df.groupby(['l1', 'l2'])['x'].diff()
有趣的是,尼基尔的方法似乎更快

import pandas as pd
import timeit

df = pd.DataFrame(data=[(t, l1, l2, t * t * (1 + l2 + l1))
                        for t in range(3)
                        for l1 in [3, 4]
                        for l2 in [10, 100]],
                  columns=['t', 'l1', 'l2', 'x'])

N = 1000

t = timeit.timeit("df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff())", setup='from __main__ import df', number=N)
print(t)  # 1.8100

def diff(x):
    return x - x.shift()

t = timeit.timeit("df.groupby(['l1', 'l2'])['x'].apply(diff)", setup='from __main__ import df, diff', number=N)
print(t)  # 2.6829

t = timeit.timeit("df.groupby(['l1', 'l2'])['x'].diff()", setup='from __main__ import df', number=N)
print(t)  # 2.5710

对于未来的开发人员来说,一种可能更简单、更容易遵循和修改的方法是将数据帧按(l1、l2)的值分开,进行系列计算,然后重新合并。如果您不需要自己的
diff()
函数和缓慢的
apply()
方法,只需使用标准的
df.groupby(['l1',l2'])['x'].diff()
import pandas as pd
import timeit

df = pd.DataFrame(data=[(t, l1, l2, t * t * (1 + l2 + l1))
                        for t in range(3)
                        for l1 in [3, 4]
                        for l2 in [10, 100]],
                  columns=['t', 'l1', 'l2', 'x'])

N = 1000

t = timeit.timeit("df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff())", setup='from __main__ import df', number=N)
print(t)  # 1.8100

def diff(x):
    return x - x.shift()

t = timeit.timeit("df.groupby(['l1', 'l2'])['x'].apply(diff)", setup='from __main__ import df, diff', number=N)
print(t)  # 2.6829

t = timeit.timeit("df.groupby(['l1', 'l2'])['x'].diff()", setup='from __main__ import df', number=N)
print(t)  # 2.5710