Python 在“应用”方法中使用上一计算行_Python_Pandas

Python 在“应用”方法中使用上一计算行

python pandas

Python 在“应用”方法中使用上一计算行,python,pandas,Python,Pandas,我可以在当前行评估中使用applyaxis=1中先前计算的答案吗我有这个df： df = pd.DataFrame(np.random.randn(5,3),columns=list('ABC')) df A B C String_column 0 0.297925 -1.025012 1.307090 'a' 1 -1.527406 0.533451 -0.650252 'b' 2 -1

我可以在当前行评估中使用applyaxis=1中先前计算的答案吗

我有这个df：

df = pd.DataFrame(np.random.randn(5,3),columns=list('ABC'))
df

    A           B            C         String_column
0   0.297925    -1.025012   1.307090   'a'
1   -1.527406   0.533451    -0.650252  'b'
2   -1.646425   0.738068    0.562747   'c'
3   -0.045872   0.088864    0.932650   'd'
4   -0.964226   0.542817    0.873731   'e'

我试图为每一行添加上一行的值乘以2，然后添加到当前值，而不操纵字符串列，例如row=row+rowshift-1*0.5。这是我目前掌握的代码：

def calc_by_previous_answer(row):
    #here i have only the current row so I'm unable to get the previous one
    row = row * 0.5
    return row

#add the shift here will not propagate the previous answer
df = df.apply(calc_by_previous_answer, axis=1)
df

不需要使用apply，您可以按如下方式解决它。由于要在计算以下行值时使用更新的行值，因此需要使用for循环

cols = ['A','B','C']
for i in range(1, len(df)):
    df.loc[i, cols] = df.loc[i-1, cols] * 0.5 + df.loc[i, cols]

结果:

            A           B          C String_column
0    0.297925   -1.025012   1.307090           'a'
1   -1.378443    0.020945   0.003293           'b'
2   -2.335647    0.748541   0.564393           'c'
3   -1.213695    0.463134   1.214847           'd'
4   -1.571074    0.774384   1.481154           'e'

不容易，但可以通过loc按以前的值进行选择，对于仅选择数字列，请使用：

解决方案不适用于：

编辑：

def calc_by_previous_answer(row):
    #here i have only the current row so I'm unable to get the previous one     
    #cannot select previous row of first row because not exist
    if row.name > 0:
        row = df.loc[row.name-1, c] * 0.5 + row
#    else:
#        row = row * 0.5
    return row

c =  df.select_dtypes(np.number).columns
df[c] = df[c].apply(calc_by_previous_answer, axis=1)
print (df)
          A         B         C String_column
0  0.297925 -1.025012  1.307090           'a'
1 -1.378443  0.020945  0.003293           'b'
2 -2.410128  1.004794  0.237621           'c'
3 -0.869085  0.457898  1.214023           'd'
4 -0.987162  0.587249  1.340056           'e'

c = df.select_dtypes(np.number).columns
df[c] = df[c].add(df[c].shift() * 0.5, fill_value=0)
print (df)

          A         B         C String_column
0  0.297925 -1.025012  1.307090           'a'
1 -1.378443  0.020945  0.003293           'b'
2 -2.410128  1.004794  0.237621           'c'
3 -0.869085  0.457898  1.214023           'd'
4 -0.987162  0.587249  1.340056           'e'

c = df.select_dtypes(np.number).columns
for idx, row in df.iterrows():
    if row.name > 0:
        df.loc[idx, c] = df.loc[idx-1, c] * 0.5 + df.loc[idx, c]

print (df)
          A         B         C String_column
0  0.297925 -1.025012  1.307090           'a'
1 -1.378443  0.020945  0.003293           'b'
2 -2.335647  0.748541  0.564393           'c'
3 -1.213695  0.463134  1.214847           'd'
4 -1.571074  0.774384  1.481154           'e'