Python 基于多个条件和多个列创建新列
考虑到下面这些列旁边还有其他列,我想通过这3列创建一个新列,定义每行的最终状态Python 基于多个条件和多个列创建新列,python,pandas,Python,Pandas,考虑到下面这些列旁边还有其他列,我想通过这3列创建一个新列,定义每行的最终状态 status_1 status_2 status_3 a_accepted_with_comment a_revised c_approved a_accepted_with_comment c_rejected nan a_rejected a_approved
status_1 status_2 status_3
a_accepted_with_comment a_revised c_approved
a_accepted_with_comment c_rejected nan
a_rejected a_approved nan
a_rejected nan nan
从第3列中,如果最后一列的值显示c_approved,则新列将给出approved
从第3列中,如果具有值的最后一列显示c_rejected,则新列将给出rejected
从第3列中,如果具有值的最后一列显示已批准,则新列将给出修订的值
从第3列中,如果具有值的最后一列显示“已拒绝”,则新列将给出“已拒绝”
最后的表格如下:
status_1 status_2 status_3 final_status
a_accepted_with _comment a_revised c_approved approved
a_accepted_with_comment c_rejected nan rejected
b_rejected a_approved nan revised
a_rejected nan nan rejected
如何在python中创建具有如此多条件的新专栏
提前感谢。让我们用
np尝试ffill
。选择
s = df.ffill(1).iloc[:,-1]
c1 = s=='c_approved'
c2 = s.isin(['c_rejected','a_rejected'])
c3 = s=='a_approved'
df['new'] = np.select([c1,c2,c3],['approve','rejected','revised'])
df
Out[210]:
status_1 status_2 status_3 new
0 a_accepted_with_comment a_revised c_approved approve
1 a_accepted_with_comment c_rejected NaN rejected
2 a_rejected a_approved NaN revised
3 a_rejected NaN NaN rejected
让我们用np尝试ffill
。选择
s = df.ffill(1).iloc[:,-1]
c1 = s=='c_approved'
c2 = s.isin(['c_rejected','a_rejected'])
c3 = s=='a_approved'
df['new'] = np.select([c1,c2,c3],['approve','rejected','revised'])
df
Out[210]:
status_1 status_2 status_3 new
0 a_accepted_with_comment a_revised c_approved approve
1 a_accepted_with_comment c_rejected NaN rejected
2 a_rejected a_approved NaN revised
3 a_rejected NaN NaN rejected
您可以使用ffill
和map
跟踪您的每个标准及其结果
response_rules = {
"c_approved": "approved",
"c_rejected": "rejected",
"a_approved": "revised",
"a_rejected": "rejected"
}
df["final_status"] = df.ffill(axis=1)["status_3"].map(response_rules)
print(df)
status_1 status_2 status_3 final_status
0 a_accepted_with_comment a_revised c_approved approved
1 a_accepted_with_comment c_rejected NaN rejected
2 a_rejected a_approved NaN revised
3 a_rejected NaN NaN rejected
如果有很多规则,更好的设计模式可能是保留一个易于阅读/编辑的字典,将结果映射到每个标准,然后在调用.map
response_rules = {
"approved": ["c_approved"],
"rejected": ["c_rejected", "a_rejected"],
"revised": ["a_approved"]
}
# invert dictionary
inverted_rules = {vv: k for k, v in response_rules.items() for vv in v}
# same as before
df["final_status"] = df.ffill(axis=1)["status_3"].map(inverted_rules)
print(df)
status_1 status_2 status_3 final_status
0 a_accepted_with_comment a_revised c_approved approved
1 a_accepted_with_comment c_rejected NaN rejected
2 a_rejected a_approved NaN revised
3 a_rejected NaN NaN rejected
# Just so you can see:
print(inverted_rules)
{'a_approved': 'revised',
'a_rejected': 'rejected',
'c_approved': 'approved',
'c_rejected': 'rejected'}
您可以使用ffill
和map
跟踪您的每个标准及其结果
response_rules = {
"c_approved": "approved",
"c_rejected": "rejected",
"a_approved": "revised",
"a_rejected": "rejected"
}
df["final_status"] = df.ffill(axis=1)["status_3"].map(response_rules)
print(df)
status_1 status_2 status_3 final_status
0 a_accepted_with_comment a_revised c_approved approved
1 a_accepted_with_comment c_rejected NaN rejected
2 a_rejected a_approved NaN revised
3 a_rejected NaN NaN rejected
如果有很多规则,更好的设计模式可能是保留一个易于阅读/编辑的字典,将结果映射到每个标准,然后在调用.map
response_rules = {
"approved": ["c_approved"],
"rejected": ["c_rejected", "a_rejected"],
"revised": ["a_approved"]
}
# invert dictionary
inverted_rules = {vv: k for k, v in response_rules.items() for vv in v}
# same as before
df["final_status"] = df.ffill(axis=1)["status_3"].map(inverted_rules)
print(df)
status_1 status_2 status_3 final_status
0 a_accepted_with_comment a_revised c_approved approved
1 a_accepted_with_comment c_rejected NaN rejected
2 a_rejected a_approved NaN revised
3 a_rejected NaN NaN rejected
# Just so you can see:
print(inverted_rules)
{'a_approved': 'revised',
'a_rejected': 'rejected',
'c_approved': 'approved',
'c_rejected': 'rejected'}
谢谢你的回答!在我的例子中,列status_3并不总是为了给出最终的_状态而要读取的列。它可能来自状态2或状态1。我该怎么做呢?IIUC,调用.ffill(axis=1)
就能解决这个问题。本质上,我们从左到右传播最后一个有效的非NAN
响应,直到“status_3”列完全被最后一个有效条目填充。因此,df.ffill(axis=1)[“status_3”]
的结果输出:[“c_approved”、“c_reproved”、“a_approved”、“a_reproved”]
这就是我们在将调用链接到.map(…)
时操作的内容,谢谢您的回答!在我的例子中,列status_3并不总是为了给出最终的_状态而要读取的列。它可能来自状态2或状态1。我该怎么做呢?IIUC,调用.ffill(axis=1)
就能解决这个问题。本质上,我们从左到右传播最后一个有效的非NAN
响应,直到“status_3”列完全被最后一个有效条目填充。因此,df.ffill(axis=1)[“status_3”]
的结果输出:[“c_approved”、“c_reproved”、“a_approved”、“a_reproved”]
这就是我们在将调用链接到.map(…)