Python 在pandas中与regex合并
我有两个表A和表B,需要通过常规表达式连接:Python 在pandas中与regex合并,python,regex,pandas,merge,Python,Regex,Pandas,Merge,我有两个表A和表B,需要通过常规表达式连接: import pandas as pd import numpy as np a_df = pd.DataFrame({'key': ['A','B','C','D','E','F'], 'cola1': ['c1A','c1B', 'c1C', 'c1D', 'c1E', 'c1F'], 'cola2': ['c2A','c2B', 'c2C', 'c2D', 'c2
import pandas as pd
import numpy as np
a_df = pd.DataFrame({'key': ['A','B','C','D','E','F'],
'cola1': ['c1A','c1B', 'c1C', 'c1D', 'c1E', 'c1F'],
'cola2': ['c2A','c2B', 'c2C', 'c2D', 'c2E', 'c2F']})
b_df = pd.DataFrame({'key': ['A|B|C','E|F'],
'colb1': ['cb1ABC','cb1EF'],
'colb2': ['cb2ABC','c2EF']})
生成的数据帧应如下所示:
a_b_merge = pd.DataFrame({'key_a': ['A','B','C','D','E','F'],
'cola1': ['c1A','c1B', 'c1C', 'c1D', 'c1E', 'c1F'],
'cola2': ['c2A','c2B', 'c2C', 'c2D', 'c2E', 'c2F'],
'key_b': ['A|B|C', 'A|B|C', 'A|B|C', np.nan, 'E|F', 'E|F'],
'colb1': ['cb1ABC','cb1ABC', 'cb1ABC', np.nan, 'c1E', 'c1F'],
'colb2': ['cb2ABC','cb2ABC', 'cb2ABC', np.nan, 'c2EF', 'c2EF']})
SELECT * FROM table_a a
LEFT JOIN table_b b
ON b.key LIKE CONCAT('%', a.key, '%')
我的SQL解决方案如下所示:
a_b_merge = pd.DataFrame({'key_a': ['A','B','C','D','E','F'],
'cola1': ['c1A','c1B', 'c1C', 'c1D', 'c1E', 'c1F'],
'cola2': ['c2A','c2B', 'c2C', 'c2D', 'c2E', 'c2F'],
'key_b': ['A|B|C', 'A|B|C', 'A|B|C', np.nan, 'E|F', 'E|F'],
'colb1': ['cb1ABC','cb1ABC', 'cb1ABC', np.nan, 'c1E', 'c1F'],
'colb2': ['cb2ABC','cb2ABC', 'cb2ABC', np.nan, 'c2EF', 'c2EF']})
SELECT * FROM table_a a
LEFT JOIN table_b b
ON b.key LIKE CONCAT('%', a.key, '%')
使用pandas
merge()
有什么方法可以实现这一点吗?如果不是,哪一个是最优雅的解决方案?您可以创建一个包含b_df.key提取的临时列,合并两个数据帧,然后删除临时列:
a_df.merge(
b_df.assign(temporary=lambda x: x.key.str.split("|")).explode("temporary"),
left_on="key",
right_on="temporary",
how="outer",
suffixes=("_a", "_b"),
).iloc[:, :-1]
key_a cola1 cola2 key_b colb1 colb2
0 A c1A c2A A|B|C cb1ABC cb2ABC
1 B c1B c2B A|B|C cb1ABC cb2ABC
2 C c1C c2C A|B|C cb1ABC cb2ABC
3 D c1D c2D NaN NaN NaN
4 E c1E c2E E|F cb1EF c2EF
5 F c1F c2F E|F cb1EF c2EF
您可以创建包含b_df.key提取的临时列,合并两个数据帧,然后删除临时列:
a_df.merge(
b_df.assign(temporary=lambda x: x.key.str.split("|")).explode("temporary"),
left_on="key",
right_on="temporary",
how="outer",
suffixes=("_a", "_b"),
).iloc[:, :-1]
key_a cola1 cola2 key_b colb1 colb2
0 A c1A c2A A|B|C cb1ABC cb2ABC
1 B c1B c2B A|B|C cb1ABC cb2ABC
2 C c1C c2C A|B|C cb1ABC cb2ABC
3 D c1D c2D NaN NaN NaN
4 E c1E c2E E|F cb1EF c2EF
5 F c1F c2F E|F cb1EF c2EF