Python Pandas：使用同一列中的值按条件更改列中的值_Python_Python 3.x_Pandas_Dataframe_Contains

Python Pandas：使用同一列中的值按条件更改列中的值

python python-3.x pandas dataframe

Python Pandas：使用同一列中的值按条件更改列中的值,python,python-3.x,pandas,dataframe,contains,Python,Python 3.x,Pandas,Dataframe,Contains,我需要将一列中名为'month'的值替换为同一列中基于另一列'step\u name'的值。如果df.step\u name.str.contains（'step1'）我想使用'month'的值，其中df.step\u name.str.contains（'step2'）。我使用了df.loc[]，但它只是用'step1'删除了月份的值 for i in set(df['id']): df.loc[(df.id.str.contains(i))&(df.step_name.str.con

我需要将一列中名为

'month'

的值替换为同一列中基于另一列

'step\u name'

的值。如果

df.step\u name.str.contains（'step1'）

我想使用

'month'

的值，其中

df.step\u name.str.contains（'step2'）

。我使用了

df.loc[]

，但它只是用

'step1'

删除了

月份的值
for i in set(df['id']): df.loc[(df.id.str.contains(i))&(df.step_name.str.contains('step1')),'month'] = df.loc[(df.id.str.contains(i))&(df.step_name.str.contains('step2')),'month']

假设源数据帧包含：
   id step_name     month
0  10     step1   January
1  10     step2     March
2  12     step1  February
3  12     step2     April
4  14     step1       May

因此，在索引为0和2的行中（step_name=='step1'）
月份列应使用下一行的值进行更新
（步骤名称==“步骤2”，相同id）
要执行此操作，请运行：
df.set_index('id', inplace=True)
df.update(df[df.step_name == 'step2']['month'])
df.reset_index(inplace=True)

结果是：
   id step_name  month
0  10     step1  March
1  10     step2  March
2  12     step1  April
3  12     step2  April
4  14     step1    May

注意，update实际上使用各自的id更新两行，
但是，对于step_name=='step2'的行，不会更改任何内容
在我看来，我的解决方案比你的
每个id都有单独的更新。
我知道你在那里做了什么！这很微妙，是一个很好的错误
首先，我将进行快速清理，以便我们可以看到发生了什么：
# Your code.
is_step1 = new_df.step_name.str.contains('step1')
is_step2 = new_df.step_name.str.contains('step2')

for i in set(df['id']): 
  is_id = df.id.str.contains(i)
  df.loc[is_id & is_step1, 'month'] = df.loc[is_id & is_step2, 'month']

你使用两个相互作用的面具
'''
mask1 mask2  => df[mask1] df[mask2]
1     0         value1    NaN        -> value1 = NaN
0     1         NaN       value2
0     0         NaN       NaN
0     0         NaN       NaN
'''

如果改用数组，pandas将使用赋值左侧要填充的值映射数组
new_df.loc[is_id & is_step1, 'month'] = new_df.loc[is_id & is_step2, 'month'].values

。。。事情就是这样的：
'''
mask1 mask2  => df[mask1] df[mask2].values
1     0         value1    value2            -> value1 = value2
0     1         NaN       
0     0         NaN       
0     0         NaN       
'''

现在，通过示例，如果您想交换步骤1和步骤2的月份
# N.B. I don't say it is best practice, but it works!
new_df = df.sort_values('id')

is_step1 = new_df.step_name.str.contains('step1')
is_step2 = new_df.step_name.str.contains('step2')

c = df.loc[is_step1, 'month'].values
new_df.loc[is_step1, 'month'] = new_df.loc[is_step2, 'month'].values
new_df.loc[is_step2, 'month'] = c

我相信瓦尔迪乌波的解决方案是最好的。接受他的回答