Python 如果表中有公共列/无公共列或未知列,如何使用外部联接进行合并
问题陈述:如果我们没有公共密钥(如屏幕上显示的任何附加密钥),如何执行外部联接 来自json的df\u a\u 1:Python 如果表中有公共列/无公共列或未知列,如何使用外部联接进行合并,python,json,pandas,merge,Python,Json,Pandas,Merge,问题陈述:如果我们没有公共密钥(如屏幕上显示的任何附加密钥),如何执行外部联接 来自json的df\u a\u 1: [ { "bookid": "12345", "bookname": "who am i" } ] [ { "bookid": "12345",
[
{
"bookid": "12345",
"bookname": "who am i"
}
]
[
{
"bookid": "12345",
"bookname": "who am i",
"Author" : "asp"
}
]
来自json_2的df_b:
[
{
"bookid": "12345",
"bookname": "who am i"
}
]
[
{
"bookid": "12345",
"bookname": "who am i",
"Author" : "asp"
}
]
现在,我想通过每个键和值来找出这两个datafrme之间的差异(因为我需要将输出写入html表,每个列的比较作为单独的df)
我在下面尝试的内容:
df1 = pd.merge(df_a[['bookid']],df_b[['bookid']],left_index=True,right_index=True)
df1['diff'] = np.where((df1['bookid_x']==df1['booid_y']),'No', 'Yes')
df2 = pd.merge(df_a[['bookname']],df_b[['bookname']],left_index=True,right_index=True)
df2['diff'] = np.where((df2['bookname_x']==df2['bookname_y']),'No', 'Yes')
df3 = What should i write here for that unknown column of Author coming from df2 ?
with open(r"c:\csv\booktest.html", 'w') as _file:
_file.write(df1.to_html(index=False) + "<br>" + df2.to_html(index=False) + "<br>" + df3.to_html(index=False))
一种方法是使用
.align()
对齐两个数据帧,使列相同
执行此操作后,df_a
和df_b
将具有相同的列
print(df_a)
Author bookid bookname
0 NaN 12345 who am i
print(df_b)
Author bookid bookname
0 asp 12345 who am i
现在,您可以应用必须获得的df3
df1 = pd.merge(df_a[['bookid']], df_b[['bookid']], left_index=True, right_index=True)
df1['diff'] = np.where((df1['bookid_x']==df1['bookid_y']), 'No', 'Yes')
df2 = pd.merge(df_a[['bookname']], df_b[['bookname']], left_index=True, right_index=True)
df2['diff'] = np.where((df2['bookname_x']==df2['bookname_y']), 'No', 'Yes')
df3 = pd.merge(df_a[['Author']], df_b[['Author']], left_index=True, right_index=True)
df3['diff'] = np.where((df3['Author_x']==df3['Author_y']), 'No', 'Yes')
print(df1)
print(df2)
print(df3)
结果:
bookid_x bookid_y diff
0 12345 12345 No
bookname_x bookname_y diff
0 who am i who am i No
Author_x Author_y diff
0 NaN asp Yes
编辑:
[
{
"bookid": "12345",
"bookname": "who am i"
}
]
[
{
"bookid": "12345",
"bookname": "who am i",
"Author" : "asp"
}
]
当然,对于df中的每一列,您可以将公共语句放入循环中
for col in df_b.columns:
df_temp = pd.merge(df_a[[col]], df_b[[col]], left_index=True, right_index=True)
df_temp['diff'] = np.where((df_temp[col+'_x'] == df_temp[col+'_y']), 'No', 'Yes')
print(df_temp)
或者更有效地,您可以这样做-合并两个dfs(按所有列),然后找到这对列之间的差异,并在列循环中导出为html
df_temp = pd.merge(df_a, df_b, left_index=True, right_index=True)
with open(r"booktest.html", 'w') as _file:
for col in df_a.columns:
df_temp[col+'_diff'] = np.where((df_temp[col+'_x'] == df_temp[col+'_y']), 'No', 'Yes')
_file.write(df_temp[[col + '_x', col + '_y', col + '_diff']].to_html(index=False) + "<br>")
print(df_temp)
结果:
bookid_x bookid_y diff
0 12345 12345 No
bookname_x bookname_y diff
0 who am i who am i No
Author_x Author_y diff
0 NaN asp Yes
编辑2:
[
{
"bookid": "12345",
"bookname": "who am i"
}
]
[
{
"bookid": "12345",
"bookname": "who am i",
"Author" : "asp"
}
]
根据注释固定对齐
text_align = '<style>.dataframe td { text-align: right; }</style>'
with open(r"booktest.html", 'w') as _file:
for col in df_a.columns:
df_temp = pd.DataFrame()
df_temp[col + '_current'], df_temp[col + '_future'], df_temp[col + '_diff'] = df_a[col], df_b[col], np.where((df_a[col] == df_b[col]), 'No', 'Yes')
_file.write(text_align + df_temp.to_html(index=False) + "<br>")
print(df_temp)
结果:
bookid_x bookid_y diff
0 12345 12345 No
bookname_x bookname_y diff
0 who am i who am i No
Author_x Author_y diff
0 NaN asp Yes
@asp我已经在html中发布了我得到的结果。你确定你已经对齐了dfs吗?@asp你可以根据需要设置后缀,而不是“\x”或“\y”。如果你想对齐html字段,我想该选项是使用自定义html样式并将其添加到
。to\u html
选项中。请参阅edit@asp谢谢!!很高兴我能帮忙。@asp如果要删除包含所有NaN
s的列,则可以使用df_temp.dropna(how='all',axis=1,inplace=True)
。这将删除包含所有NaN
values@asp见编辑3