Python在循环中更新相同的命名列并进行其他计算
在dataframe中,我希望在相同的命名列上进行迭代,当它们的和超过“val_n”值时进行迭代。我想要4样东西: 1) 超出时间(从“val_n”值超过的迭代次数) 2) sum_col(相同命名列的总和) 3) 在EXCEND when时,我想将相应的col值替换为(col-(sum\u col-val\n) 4) 当超过_点后,我想将剩余的cols值替换为0 数据帧看起来像:Python在循环中更新相同的命名列并进行其他计算,python,pandas,Python,Pandas,在dataframe中,我希望在相同的命名列上进行迭代,当它们的和超过“val_n”值时进行迭代。我想要4样东西: 1) 超出时间(从“val_n”值超过的迭代次数) 2) sum_col(相同命名列的总和) 3) 在EXCEND when时,我想将相应的col值替换为(col-(sum\u col-val\n) 4) 当超过_点后,我想将剩余的cols值替换为0 数据帧看起来像: id col1 col2 col3 col4 col5 col6 col7
id col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12 col13 col14 val_n
1 350 350 350 350 350 350 350 350 350 350 0 0 0 0 3105.61
2 50 50 55 105 50 0 50 100 50 50 50 50 1025 1066.86 3185.6
3 0 0 0 0 0 3495.1 0 0 0 0 0 0 0 3495.1 3477.76
所需数据帧:
id col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12 col13 col14 val_n exceed_when sum_col
1 350 350 350 350 350 350 350 350 305.61 0 0 0 0 0 3105.61 9 3500
2 50 50 55 105 50 0 50 100 50 50 50 50 1025 1066.86 3185.6 2751.86
3 0 0 0 0 0 3477.76 0 0 0 0 0 0 0 0 3477.76 6 6990.2
这就是我尝试过的:
def trans(row):
row['sum_col'] = 0
row['exceed_ind'] = 0
for i in range(1, 15):
row['sum_col'] += row['col' + str(i)]
if ((row['exceed_ind'] == 0) &
(row['sum_col'] >= row['val_n'])):
row['exceed_ind'] = 1
row['exceed_when'] = i
else:
continue
if row['exceed_when'] == i:
row['col' + str(i)] = (
row['col' + str(i)] - (
row['sum_col'] - row['val_n']))
elif row['exceed_when'] < i:
row['col' + str(i)] = 0
else:
row['col' + str(i)] = row['col' + str(i)]
return row
df1 = df.apply(trans, axis=1)
谢谢 据我所知,
.apply
函数将只传递行的一个副本
,所有更新仅发生在副本上,而不是原始的数据帧
本身。在这种情况下,您必须遍历行并使用索引更新它们
df['sum_col'] = 0
df['exceed_ind'] = 0
df['exceed_when'] = 0
for idx, row in df.iterrows():
sum_col = 0
exceed_ind = 0
exceed_when = 0
for i in range(1, 15):
sum_col += row['col' + str(i)]
if ((exceed_ind == 0) &
(sum_col >= row['val_n'])):
exceed_ind = 1
exceed_when = i
df.loc[idx, 'exceed_ind'] = exceed_ind
df.loc[idx, 'exceed_when'] = exceed_when
df.loc[idx, 'col' + str(i)] = (row['col' + str(i)] - (sum_col - row['val_n']))
elif (exceed_ind==1) & (exceed_when < i):
df.loc[idx, 'col' + str(i)] = 0
df.loc[idx, 'sum_col'] = sum_col
print(df)
谢谢大卫比拉!谢谢你的回答,它是正确的。但是你的回答提醒了我我在代码中遗漏了什么,即初始化df['except_when']=0,如果条件为0,则初始化一个。Anku很高兴我能提供帮助!干杯:)
df['sum_col'] = 0
df['exceed_ind'] = 0
df['exceed_when'] = 0
for idx, row in df.iterrows():
sum_col = 0
exceed_ind = 0
exceed_when = 0
for i in range(1, 15):
sum_col += row['col' + str(i)]
if ((exceed_ind == 0) &
(sum_col >= row['val_n'])):
exceed_ind = 1
exceed_when = i
df.loc[idx, 'exceed_ind'] = exceed_ind
df.loc[idx, 'exceed_when'] = exceed_when
df.loc[idx, 'col' + str(i)] = (row['col' + str(i)] - (sum_col - row['val_n']))
elif (exceed_ind==1) & (exceed_when < i):
df.loc[idx, 'col' + str(i)] = 0
df.loc[idx, 'sum_col'] = sum_col
print(df)
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 \
id
1 350 350 350 350 350 350.00 350 350 305.61 0 0
2 50 50 55 105 50 0.00 50 100 50.00 50 50
3 0 0 0 0 0 3477.76 0 0 0.00 0 0
col12 col13 col14 val_n sum_col exceed_ind exceed_when
id
1 0 0 0.00 3105.61 3500.00 1 9
2 50 1025 1066.86 3185.60 2751.86 0 0
3 0 0 0.00 3477.76 6990.20 1 6