Python 如果满足条件，则更新列_Python_Pandas_Dataframe

Python 如果满足条件，则更新列

python pandas dataframe

Python 如果满足条件，则更新列,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据帧要处理，我正在执行一些检查我正在检查“A”、“B”和“C”列下的重复值是否显示相同的数字，但在D列下有相反的符号 A. B C D E 1111 AAA 123 0.01 待替换的注释 2222 BBB 456 5. 待替换的注释 3333 CCC 789 10 什么都不要做 1111 AAA 123 -0.01 待替换的注释 2222 BBB 456 -5 待替换的注释 3333 CCC 789 -9 什么都不要做我们可以将列A、B、C上的数据帧与D列中的一系列绝对值一起分组

我有一个数据帧要处理，我正在执行一些检查

我正在检查“A”、“B”和“C”列下的重复值是否显示相同的数字，但在D列下有相反的符号

A. B C D E 1111 AAA 123 0.01 待替换的注释 2222 BBB 456 5. 待替换的注释 3333 CCC 789 10 什么都不要做 1111 AAA 123 -0.01 待替换的注释 2222 BBB 456 -5 待替换的注释 3333 CCC 789 -9 什么都不要做

我们可以

将列A
、B
、C
上的数据帧与D
列中的一系列绝对值一起分组，然后使用和转换D
列检查是否存在大小相同但符号相反的对
df['E'] = df.groupby(['A', 'B', 'C', df['D'].abs()])['D'].transform('sum').eq(0) 


如果E中有多个对，或者有1个正数和多个负数，反之亦然，则此方法有效
import pandas as pd
import numpy as np

df_1 = df[df['D'] >= 0].copy().reset_index()
df_2 = df[df['D'] < 0].copy().reset_index()
df_2['D'] = -df_2['D']

indexes = df_1.merge(df_2, on=['A', 'B', 'C', 'D'], how='inner')[['index_x', 'index_y']].values.tolist()
indexes = [item for sublist in indexes for item in sublist]

df['E_new'] = np.where(df.index.isin(indexes), 'new comment', df['E'])

print(df)

#       A    B    C      D                       E              E_new
# 0  1111  AAA  123   0.01  comment to be replaced        new comment
# 1  2222  BBB  456   5.00  comment to be replaced        new comment
# 2  3333  CCC  789  10.00       don't do anything  don't do anything
# 3  1111  AAA  123  -0.01  comment to be replaced        new comment
# 4  2222  BBB  456  -5.00  comment to be replaced        new comment
# 5  3333  CCC  789  -9.00       don't do anything  don't do anything

将熊猫作为pd导入
将numpy作为np导入
df_1=df[df['D']>=0].copy（）.reset_index（）
df_2=df[df['D']<0]。复制（）。重置_索引（）
df_2['D']=-df_2['D']
index=df_1.merge（df_2，on=['A'，'B'，'C'，'D'，'how='inner'）[['index_x'，'index_y']].values.tolist（）
indexes=[子列表中项目的索引中的子列表中的项目]
df['E_new']=np.where（df.index.isin（索引），'newcomment'，df['E']）
打印（df）
#A B C D E_新
#0 1111 AAA 123 0.01要替换的注释新注释
#1222 BBB 456 5.00注释将替换为新注释
#2 3333 CCC 789 10.00不要做任何事不要做任何事
#3 1111 AAA 123-0.01注释将替换为新注释
#4 2222 BBB 456-5.00注释将替换为新注释
#5 3333 CCC 789-9.00不要做任何事不要做任何事
每个A、B、C的唯一值是否可以多于/少于两行？感谢您的评论@ShubhamSharma。是的，如果有更多/更少的栏，注释将有所不同。请不要编辑问题，使现有答案无效。最好换一个新的。阅读相关文章，了解更多关于好的信息practices@DaniB请考虑添加一个新的问题，并回滚您当前的编辑，因为您最近的编辑完全无效现有的答案。如果我还想为满足条件的行更新列“E”下的注释呢？@DaniB我们可以使用np.where
。请检查df['E']=np。其中（df['E']，'it work'，'it not work'）
如果很清楚只能有1个正值和1个负值，那么这个答案非常聪明。但是，如果存在不均匀的条目或多个A、B、C组合，则会变得不稳定。例如，尝试在您的样本中添加2222 BBB 456-1条要替换的注释
。感谢您的注释@Andreas，但我认为即使有不均匀的条目，它仍然有效。因为在这里，我们对列D的绝对值进行了额外分组，所以不均匀项不会找到一对，并且在任何情况下转换的结果都不会等于零。@Andreas这是一个很好的观点。我的下一个目标是找到加在一起（满足条件）的配对，它们给出了不匹配的数量。
import pandas as pd
import numpy as np

df_1 = df[df['D'] >= 0].copy().reset_index()
df_2 = df[df['D'] < 0].copy().reset_index()
df_2['D'] = -df_2['D']

indexes = df_1.merge(df_2, on=['A', 'B', 'C', 'D'], how='inner')[['index_x', 'index_y']].values.tolist()
indexes = [item for sublist in indexes for item in sublist]

df['E_new'] = np.where(df.index.isin(indexes), 'new comment', df['E'])

print(df)

#       A    B    C      D                       E              E_new
# 0  1111  AAA  123   0.01  comment to be replaced        new comment
# 1  2222  BBB  456   5.00  comment to be replaced        new comment
# 2  3333  CCC  789  10.00       don't do anything  don't do anything
# 3  1111  AAA  123  -0.01  comment to be replaced        new comment
# 4  2222  BBB  456  -5.00  comment to be replaced        new comment
# 5  3333  CCC  789  -9.00       don't do anything  don't do anything