Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/317.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将行与前面的所有行进行比较_Python_Pandas - Fatal编程技术网

Python 将行与前面的所有行进行比较

Python 将行与前面的所有行进行比较,python,pandas,Python,Pandas,我有一个这样的数据框 df = pd.DataFrame({'numb': [2,4,6,2,4,9]}) print(df) numb 0 2 1 4 2 6 3 2 4 4 5 9 我想计算numb小于前面的numb的行数,如 numb lesser_count 0 2 0 1 4 0 2 6 0 3 2

我有一个这样的数据框

df = pd.DataFrame({'numb': [2,4,6,2,4,9]})
print(df)
   numb
0     2
1     4
2     6
3     2
4     4
5     9
我想计算numb小于前面的numb的行数,如

   numb  lesser_count
0     2             0
1     4             0
2     6             0
3     2             2
4     4             1
5     9             0

我遇到了一个类似的问题,但不知道如何计算。

让我们试试numpy广播:

a = df['numb'].to_numpy()

df['lesser_count'] = np.triu(a<a[:,None]).sum(0)

我们可以使用
numba
实现快速for循环:

来自numba import njit
@njit
def计数(arr)较小:
计数=np.空(arr.shape[0])
对于i,枚举中的val(arr):
计数[i]=np.和(val
numb较小的计数
0     2             0
1     4             0
2     6             0
3     2             2
4     4             1
5     9             0

因为我们需要一个迭代的解决方案,或者依赖于内存效率低下的基于numpy的方法,
numba
是一个很好的选择:

from numba import njit, int64
@njit('int64[:](int64[:])')
def lesser_prev(a):
    out = np.empty(len(a), dtype=int64)
    count = 0
    for i in range(len(a)):
        curr = a[i]
        for j in range(i):
            if curr<a[j]:
                count+=1
        out[i] = count
        count = 0
    return out
检查性能:

df_ = pd.concat([df]*1000).reset_index()

%timeit lesser_prev(df_.numb.to_numpy())
# 15.2 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit count_lesser(df_['numb'].to_numpy())
# 15.3 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
a = df_['numb'].to_numpy()
np.triu(a<a[:,None]).sum(0)
# 280 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
df=pd.concat([df]*1000).重置索引()
%timeit lesser_prev(df_u.numb.to_numpy())
#每个回路15.2 ms±1.05 ms(7次运行的平均值±标准偏差,每个100个回路)
%timeit count_less(df_['numb'].to_numpy())
#每个回路15.3 ms±132µs(7次运行的平均值±标准偏差,每个100个回路)
%%时间
a=df_u['numb'].to_numpy()
np.triu(a)
df['lesser_count'] = lesser_prev(df.numb.to_numpy())

print(df)
   numb  lesser_count
0     2             0
1     4             0
2     6             0
3     2             2
4     4             1
5     9             0
df_ = pd.concat([df]*1000).reset_index()

%timeit lesser_prev(df_.numb.to_numpy())
# 15.2 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit count_lesser(df_['numb'].to_numpy())
# 15.3 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
a = df_['numb'].to_numpy()
np.triu(a<a[:,None]).sum(0)
# 280 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)