Python 使用函数创建数据框列
我有一个名为Python 使用函数创建数据框列,python,pandas,Python,Pandas,我有一个名为df的数据帧,看起来像: dept ratio higher lower date 01/01/1979 B 0.522576565 2 1 01/01/1979 A 0.940614079 2 2 01/01/1979 C 0.873957946 0 1 01/01/1979 B 0.087828824
df
的数据帧,看起来像:
dept ratio higher lower
date
01/01/1979 B 0.522576565 2 1
01/01/1979 A 0.940614079 2 2
01/01/1979 C 0.873957946 0 1
01/01/1979 B 0.087828824 0 2
01/01/1979 A 0.39754345 1 2
01/01/1979 A 0.475491609 1 2
01/01/1979 B 0.140605283 0 2
01/01/1979 A 0.071007362 0 2
01/01/1979 B 0.480720923 2 2
01/01/1979 A 0.673142643 1 2
01/01/1979 C 0.73554271 0 0
我想创建一个名为compared
的新列,其中对于每一行,我想计算dept
列中与行dept
值减1匹配的值的数量。如果计数大于或等于1,则我希望返回到compared
列,以获得以下解决方案:
`compared` row value = (higher - lower) / count of dept column which matches the dept row value - 1
如果部门计数为0,则0将返回到比较列
例如,对于df
中的第一行,dept
的值为B。在dept
列中有4个值为B。4-1大于1。因此,在新的compared
列中,我希望输入较高的列值(2)减去较低的列值(1),该值等于1除以4-1
或
因此,我期望的输出如下所示:
dept ratio higher lower compared
date
01/01/1979 B 0.522576565 2 1 0.333333333
01/01/1979 A 0.940614079 2 2 0.000000000
01/01/1979 C 0.873957946 0 1 -1.000000000
01/01/1979 B 0.087828824 0 2 -0.666666667
01/01/1979 A 0.39754345 1 2 -0.250000000
01/01/1979 A 0.475491609 1 2 -0.250000000
01/01/1979 B 0.140605283 0 2 -0.666666667
01/01/1979 A 0.071007362 0 2 -0.500000000
01/01/1979 B 0.480720923 2 2 0.000000000
01/01/1979 A 0.673142643 1 2 -0.250000000
01/01/1979 C 0.73554271 0 0 0.000000000
我有一些代码,但速度非常慢:
minDept=1
for staticidx, row in df.iterrows():
dept = row['dept']
deptCount = deptPivot.loc[dept, "date"] # if error then zero
myLongs= df.loc[staticidx, "higher"]
myShorts= df.loc[staticidx, "lower"]
if deptCount > minDept:
df.loc[staticidx, "compared"] = (higher- lower)/(deptCount-1)
else:
df.loc[staticidx, "compared"] = 0
有没有更快的方法可以做到这一点?这相当简单:
counts = df.groupby('dept')['dept'].transform('count')-1
df['compared'] = (df['higher']-df['lower'])/counts
# to avoid possible division by zero warning
# also to match `counts>0` condition
# use this instead
# df.loc[counts>0,'compared'] = df['higher'].sub(df['lower']).loc[counts>0]/counts[counts>0]
输出:
dept ratio higher lower compared
date
01/01/1979 B 0.522577 2 1 0.333333
01/01/1979 A 0.940614 2 2 0.000000
01/01/1979 C 0.873958 0 1 -1.000000
01/01/1979 B 0.087829 0 2 -0.666667
01/01/1979 A 0.397543 1 2 -0.250000
01/01/1979 A 0.475492 1 2 -0.250000
01/01/1979 B 0.140605 0 2 -0.666667
01/01/1979 A 0.071007 0 2 -0.500000
01/01/1979 B 0.480721 2 2 0.000000
01/01/1979 A 0.673143 1 2 -0.250000
01/01/1979 C 0.735543 0 0 0.000000
除非有充分的理由不这样做,否则变量和函数名应遵循带有下划线的小写形式。
dept ratio higher lower compared
date
01/01/1979 B 0.522577 2 1 0.333333
01/01/1979 A 0.940614 2 2 0.000000
01/01/1979 C 0.873958 0 1 -1.000000
01/01/1979 B 0.087829 0 2 -0.666667
01/01/1979 A 0.397543 1 2 -0.250000
01/01/1979 A 0.475492 1 2 -0.250000
01/01/1979 B 0.140605 0 2 -0.666667
01/01/1979 A 0.071007 0 2 -0.500000
01/01/1979 B 0.480721 2 2 0.000000
01/01/1979 A 0.673143 1 2 -0.250000
01/01/1979 C 0.735543 0 0 0.000000