Python 使用函数创建数据框列_Python_Pandas

Python 使用函数创建数据框列

python pandas

Python 使用函数创建数据框列,python,pandas,Python,Pandas,我有一个名为df的数据帧，看起来像： dept ratio higher lower date 01/01/1979 B 0.522576565 2 1 01/01/1979 A 0.940614079 2 2 01/01/1979 C 0.873957946 0 1 01/01/1979 B 0.087828824

我有一个名为

df

的数据帧，看起来像：

            dept          ratio higher  lower
      date  
01/01/1979     B    0.522576565      2      1
01/01/1979     A    0.940614079      2      2
01/01/1979     C    0.873957946      0      1
01/01/1979     B    0.087828824      0      2
01/01/1979     A    0.39754345       1      2
01/01/1979     A    0.475491609      1      2
01/01/1979     B    0.140605283      0      2
01/01/1979     A    0.071007362      0      2
01/01/1979     B    0.480720923      2      2
01/01/1979     A    0.673142643      1      2
01/01/1979     C    0.73554271       0      0

我想创建一个名为

compared

的新列，其中对于每一行，我想计算

dept

列中与行

dept

值减1匹配的值的数量。如果计数大于或等于1，则我希望返回到

compared

列，以获得以下解决方案：

`compared` row value = (higher - lower) / count of dept column which matches the dept row value - 1

如果部门计数为0，则0将返回到比较列

例如，对于

df

中的第一行，

dept

的值为B。在

dept

列中有4个值为B。4-1大于1。因此，在新的

compared

列中，我希望输入

较高的列值（2）减去较低的列值（1），该值等于1除以4-1
或
因此，我期望的输出如下所示：
            dept          ratio higher  lower      compared
date    
01/01/1979     B    0.522576565      2      1   0.333333333
01/01/1979     A    0.940614079      2      2   0.000000000
01/01/1979     C    0.873957946      0      1  -1.000000000
01/01/1979     B    0.087828824      0      2  -0.666666667
01/01/1979     A    0.39754345       1      2  -0.250000000
01/01/1979     A    0.475491609      1      2  -0.250000000
01/01/1979     B    0.140605283      0      2  -0.666666667
01/01/1979     A    0.071007362      0      2  -0.500000000
01/01/1979     B    0.480720923      2      2   0.000000000
01/01/1979     A    0.673142643      1      2  -0.250000000
01/01/1979     C    0.73554271       0      0   0.000000000

我有一些代码，但速度非常慢：
    minDept=1
    for staticidx, row in df.iterrows():
        dept = row['dept']
        deptCount = deptPivot.loc[dept, "date"] # if error then zero
        myLongs= df.loc[staticidx, "higher"]
        myShorts= df.loc[staticidx, "lower"]

        if deptCount > minDept:

           df.loc[staticidx, "compared"] = (higher- lower)/(deptCount-1)

        else:
           df.loc[staticidx, "compared"] = 0

有没有更快的方法可以做到这一点？
这相当简单：
counts = df.groupby('dept')['dept'].transform('count')-1

df['compared'] = (df['higher']-df['lower'])/counts

# to avoid possible division by zero warning
# also to match `counts>0` condition
# use this instead
# df.loc[counts>0,'compared'] = df['higher'].sub(df['lower']).loc[counts>0]/counts[counts>0]

输出：
           dept     ratio  higher  lower  compared
date                                              
01/01/1979    B  0.522577       2      1  0.333333
01/01/1979    A  0.940614       2      2  0.000000
01/01/1979    C  0.873958       0      1 -1.000000
01/01/1979    B  0.087829       0      2 -0.666667
01/01/1979    A  0.397543       1      2 -0.250000
01/01/1979    A  0.475492       1      2 -0.250000
01/01/1979    B  0.140605       0      2 -0.666667
01/01/1979    A  0.071007       0      2 -0.500000
01/01/1979    B  0.480721       2      2  0.000000
01/01/1979    A  0.673143       1      2 -0.250000
01/01/1979    C  0.735543       0      0  0.000000

除非有充分的理由不这样做，否则变量和函数名应遵循带有下划线的小写形式。
           dept     ratio  higher  lower  compared
date                                              
01/01/1979    B  0.522577       2      1  0.333333
01/01/1979    A  0.940614       2      2  0.000000
01/01/1979    C  0.873958       0      1 -1.000000
01/01/1979    B  0.087829       0      2 -0.666667
01/01/1979    A  0.397543       1      2 -0.250000
01/01/1979    A  0.475492       1      2 -0.250000
01/01/1979    B  0.140605       0      2 -0.666667
01/01/1979    A  0.071007       0      2 -0.500000
01/01/1979    B  0.480721       2      2  0.000000
01/01/1979    A  0.673143       1      2 -0.250000
01/01/1979    C  0.735543       0      0  0.000000