Python 我需要比较两个df';s对于匹配和不匹配,我还需要确定在不匹配的情况下哪个答案来自主df
我在python中有两个数据帧,希望比较这两个数据帧以查找匹配和不匹配。但重要的是,我可以在不匹配中确定哪个答案来自主答案表,哪个答案来自用户答案 我决定使用pandas df.where函数来实现这一点,除了能够识别哪个答案来自主答案表,以及在出现不匹配的情况下哪些答案来自用户答案之外,它还起到了作用Python 我需要比较两个df';s对于匹配和不匹配,我还需要确定在不匹配的情况下哪个答案来自主df,python,pandas,Python,Pandas,我在python中有两个数据帧,希望比较这两个数据帧以查找匹配和不匹配。但重要的是,我可以在不匹配中确定哪个答案来自主答案表,哪个答案来自用户答案 我决定使用pandas df.where函数来实现这一点,除了能够识别哪个答案来自主答案表,以及在出现不匹配的情况下哪些答案来自用户答案之外,它还起到了作用 # I have a DataFrame called df_master (master answer sheet) import pandas as pd df_master = pd.
# I have a DataFrame called df_master (master answer sheet)
import pandas as pd
df_master = pd.DataFrame({'B0': [1, 0, 0, 0, 0, 1],
'B1': [0, 0, 0, 0, 1, 0],
'B2': [0, 1, 0, 0, 0, 0],
'B3': [0, 0, 1, 0, 0, 0],
'B4': [0, 0, 0, 1, 0, 0]})
print(df_master)
# B0 B1 B2 B3 B4
# 0 1 0 0 0 0
# 1 0 0 1 0 0
# 2 0 0 0 1 0
# 3 0 0 0 0 1
# 4 0 1 0 0 0
# 5 1 0 0 0 0
# I also have a DataFrame called df_answers (users answers)
df_answers = pd.DataFrame({'B0': [0, 0, 0, 0, 0, 1],
'B1': [1, 0, 0, 0, 1, 0],
'B2': [0, 0, 0, 0, 0, 0],
'B3': [0, 1, 1, 0, 0, 0],
'B4': [0, 0, 0, 1, 0, 0]})
print(df_answers)
# B0 B1 B2 B3 B4
# 0 0 1 0 0 0
# 1 0 0 0 1 0
# 2 0 0 0 1 0
# 3 0 0 0 0 1
# 4 0 1 0 0 0
# 5 1 0 0 0 0
# when I compare the the two df's, for each match, matches correctly, where there
# is no match I have used other=2. However this is a problem as I cannot see which is
# the correct answer. It would be great if there was a way to work the code to reflect
# the master as a 3 and the incorrect answer from the users to stay 2?
comparison = df_master.where(df_master.values==df_answers.values, other=2)
print(comparison)
# My Results
# B0 B1 B2 B3 B4
# 0 2 2 0 0 0
# 1 0 0 2 2 0
# 2 0 0 0 1 0
# 3 0 0 0 0 1
# 4 0 1 0 0 0
# 5 1 0 0 0 0
# Expected Results
# B0 B1 B2 B3 B4
# 0 3 2 0 0 0
# 1 0 0 3 2 0
# 2 0 0 0 1 0
# 3 0 0 0 0 1
# 4 0 1 0 0 0
# 5 1 0 0 0 0
在您的示例中,在str sum之后使用
replace
,ps:您可以自己定义映射,如{'00':'tware failed','01':'master failed'…}
(df_answers.astype(str)+df_master.astype(str)).replace({'00':0,'01':3,'10':2,'11':1})
Out[129]:
B0 B1 B2 B3 B4
0 3 2 0 0 0
1 0 0 3 2 0
2 0 0 0 1 0
3 0 0 0 0 1
4 0 1 0 0 0
5 1 0 0 0 0
谢谢@WeNYoBen,你的回答非常有效,我花了一点时间才理解,因为我是python新手,但现在我了解了它的工作原理,我也接受了你的建议,使用标签名称,例如:“01”:“master failed”。再次感谢您快速准确的回复。