Python 数据帧之间的部分表达式匹配_Python_Pandas

Python 数据帧之间的部分表达式匹配

python pandas

Python 数据帧之间的部分表达式匹配,python,pandas,Python,Pandas,我正在尝试在数据帧中的列之间执行部分字符串匹配，例如： df_A：及 df_B：期望输出： df_匹配： matched Items_A 0 purse 1 string 0 hat 1 glue 2 gum 3 cherry 3 cherry 3 cherry pie 请注意，匹配列中的数字是匹配列中的标签，可以是1、2或3。如果不匹配，则标签为0 我能够使用正则表达式匹配多个嵌套循环，但我想

我正在尝试在数据帧中的列之间执行部分字符串匹配，例如：

df_A：

及

df_B：

期望输出：

df_匹配：

matched Items_A
0       purse
1       string
0       hat
1       glue
2       gum
3       cherry
3       cherry
3       cherry pie

请注意，匹配列中的数字是匹配列中的标签，可以是1、2或3。如果不匹配，则标签为0

我能够使用正则表达式匹配多个嵌套循环，但我想知道是否有办法使用panda的库来更有效地执行操作。

重塑df_B以获得以下效果：

   level_0  level_1       0
0        0        1  string
1        0        2     gum
2        0        3  cherry
3        1        1    glue

重命名df_B列
获取df_B中唯一单词的列表
在df_B中创建一个新列，以从中的df_B中查找匹配的单词德福阿
合并和筛选

是否有可能进行多次匹配？在这种情况下会发生什么？多个匹配应该应用相同的标签，我根据您的评论更新了示例。如果存在精确匹配，则此方法有效，但不适用于部分匹配。我将修改示例以反映这一点。已经完成了。”“樱桃派”与示例中的“樱桃”相匹配

matched Items_A
0       purse
1       string
0       hat
1       glue
2       gum
3       cherry
3       cherry
3       cherry pie

   level_0  level_1       0
0        0        1  string
1        0        2     gum
2        0        3  cherry
3        1        1    glue

import regex

df_B = df_B.stack().reset_index()

df_B = df_B.rename(columns={"level_1": "matched", 0: "Items_A"})

items = df_B.Items_A.unique()

def partial_match(x, items):
    for item in items:
        if regex.search(r'.?'+item+'.?', x):
            return item
    return 0

df_A["matching_item"] = df_A["Items_A"].apply(lambda x: partial_match(x, items))


df_A = df_A.merge(df_B, how="left", left_on="matching_item", right_on="Items_A", suffixes=('', '_y'))

df_A = df_A.loc[:,["Items_A", "matched"]]