Python 通过对其他列进行一些计算,有没有最快的方法将列添加到数据集中?

Python 通过对其他列进行一些计算,有没有最快的方法将列添加到数据集中?,python,pandas,jupyter-notebook,Python,Pandas,Jupyter Notebook,我已经编写了一段代码(见下文)对数据集进行一些计算,并将结果作为一列添加到数据集中 ratio_list = [] for s,p,f in zip(A["s"], A["p"], A["f"]): m = A[(A["s"]==s) & (A["p"]==p) & (A["f"]<f)][['a', 't']].product(axis=1).sum() n = A[(A["s"]==s) & (A["p"]==p) & (A["f"]&

我已经编写了一段代码(见下文)对数据集进行一些计算,并将结果作为一列添加到数据集中

ratio_list = []

for s,p,f in zip(A["s"], A["p"], A["f"]):
    m = A[(A["s"]==s) & (A["p"]==p) & (A["f"]<f)][['a', 't']].product(axis=1).sum()
    n = A[(A["s"]==s) & (A["p"]==p) & (A["f"]<f)]['a'].sum()

    if(n==0):
        ratio_list.append(0)
    else:
        ratio_list.append(m/n)

A["ratio"] = ratio_list


s
p
列分组使用此自定义功能,主要是:


你们能给这个问题添加一些样本数据吗?使用df.apply…..这是一个非常快速的共享可测试数据集,并且是预期的result@Shrey我不会说apply方法很快,例如:agree,即使我最近从代码中删除了它。但是在某些用例中apply工作得非常好。看看这里的问题,假设df.apply对他来说可能是最简单的解决方案(最短6秒):
,s,p,f,a,t,ratio
0,101,2018,2018-01-06,2.0,10.0,13.0
1,101,2018,2018-01-06,2.0,12.0,13.0
2,101,2018,2018-01-03,4.0,14.0,0.0
3,101,2018,2018-01-03,16.0,12.0,0.0
4,101,2018,2018-01-03,12.0,14.0,0.0
5,101,2018,2018-01-06,4.0,10.0,13.0
6,101,2018,2018-01-06,14.0,23.0,13.0
7,101,2018,2018-01-08,4.0,10.0,15.222222222222221
8,101,2018,2018-01-08,20.0,14.0,15.222222222222221
9,101,2018,2018-01-08,21.0,23.0,15.222222222222221
10,101,2018,2018-01-08,21.0,23.0,15.222222222222221
11,101,2018,2018-01-09,4.0,8.0,17.566666666666666
12,101,2018,2018-01-09,10.0,14.0,17.566666666666666
13,101,2018,2018-01-13,13.0,23.0,17.01492537313433
14,101,2018,2018-01-13,9.0,23.0,17.01492537313433
15,103,2018,2018-01-31,20.0,15.0,0.0
16,103,2018,2018-01-31,2.0,15.0,0.0
17,103,2018,2018-01-31,20.0,15.0,0.0
18,103,2018,2018-01-31,20.0,15.0,0.0
19,103,2018,2018-01-31,20.0,15.0,0.0
def ratio(x):
    #2d mask for compare each value
    ma = x['f'].values < x['f'].values[:, None]
    #for pandas 0.24+
    #ma = x['f'].to_numpy() < x['f'].to_numpy()[:, None]
    #fill a and t values by mask
    a = np.where(ma, x['a'], 0)
    t = np.where(ma, x['t'], 0)

    #multiple and sum per 'columns'
    m = (a * t).sum(axis=1)
    n = a.sum(axis=1)

    #set column by condition
    x['ratio1'] = np.where(n == 0, 0, m/n)
    return x


A = A.groupby(['s','p']).apply(ratio)
print (A)
      s     p           f     a     t      ratio     ratio1
0   101  2018  2018-01-06   2.0  10.0  13.000000  13.000000
1   101  2018  2018-01-06   2.0  12.0  13.000000  13.000000
2   101  2018  2018-01-03   4.0  14.0   0.000000   0.000000
3   101  2018  2018-01-03  16.0  12.0   0.000000   0.000000
4   101  2018  2018-01-03  12.0  14.0   0.000000   0.000000
5   101  2018  2018-01-06   4.0  10.0  13.000000  13.000000
6   101  2018  2018-01-06  14.0  23.0  13.000000  13.000000
7   101  2018  2018-01-08   4.0  10.0  15.222222  15.222222
8   101  2018  2018-01-08  20.0  14.0  15.222222  15.222222
9   101  2018  2018-01-08  21.0  23.0  15.222222  15.222222
10  101  2018  2018-01-08  21.0  23.0  15.222222  15.222222
11  101  2018  2018-01-09   4.0   8.0  17.566667  17.566667
12  101  2018  2018-01-09  10.0  14.0  17.566667  17.566667
13  101  2018  2018-01-13  13.0  23.0  17.014925  17.014925
14  101  2018  2018-01-13   9.0  23.0  17.014925  17.014925
15  103  2018  2018-01-31  20.0  15.0   0.000000   0.000000
16  103  2018  2018-01-31   2.0  15.0   0.000000   0.000000
17  103  2018  2018-01-31  20.0  15.0   0.000000   0.000000
18  103  2018  2018-01-31  20.0  15.0   0.000000   0.000000
19  103  2018  2018-01-31  20.0  15.0   0.000000   0.000000