在python中通过子字符串匹配两个数据帧

在python中通过子字符串匹配两个数据帧,python,pandas,performance,optimization,string-matching,Python,Pandas,Performance,Optimization,String Matching,我有两个大数据帧(1000行),我需要通过子字符串来匹配它们,例如: df1: df2: 预计产量为: Id Title Keyword 1 The house of pump house 2 Where is Andijan andijan 3 The Joker joker 4 Good bars in Andijan andijan 5 W

我有两个大数据帧(1000行),我需要通过子字符串来匹配它们,例如:

df1:

df2:

预计产量为:

Id    Title                    Keyword
1     The house of pump        house
2     Where is Andijan         andijan
3     The Joker                joker
4     Good bars in Andijan     andijan
5     What a beautiful house   house
现在,我写了一种非常不高效的方法来匹配它,但是对于数据帧的实际大小,它运行了非常长的时间:

for keyword in df2.to_dict(orient='records'):
    df1['keyword'] = np.where(creative_df['title'].str.contains(keyword['keyword']), keyword['keyword'], df1['keyword'])

现在,我相信有一种更友好、更高效的方法可以做到这一点,并且可以让它在合理的时间内运行。

让我们试试
findall

import re
df1['new'] = df1.Title.str.findall('|'.join(df2.Keyword.tolist()),flags= re.IGNORECASE).str[0]
df1
   Id                   Title      new
0   1       The house of pump    house
1   2        Where is Andijan  Andijan
2   3               The Joker    Joker
3   4    Good bars in Andijan  Andijan
4   5  What a beautiful house    house

进一步开发@BENY的解决方案,以便能够获得每个标题的多个关键字:

regex='|'.join(关键字['Keyword'])
关键词=df['Title'].str.findall(regex,flags=re.IGNORECASE)
关键词\u explode=pd.DataFrame(关键字.explode().dropna())
合并(关键字分解,左索引=真,右索引=真)

不错。我相信在这种情况下,我们可以放弃“.tolist()”作为“join”将给熊猫系列带来相同的结果。就漂亮的熊猫语法而言,我喜欢这个解决方案!但是,就性能而言,此解决方案仍在运行相当长的一段时间而没有完成。为了便于参考,大约有65000个关键字。你知道如何提高效率吗?我最终选择了这个解决方案。为了提高运行效率,我将关键字计数减少到~1000,并分批运行我的进程。这可能是最好的办法。
for keyword in df2.to_dict(orient='records'):
    df1['keyword'] = np.where(creative_df['title'].str.contains(keyword['keyword']), keyword['keyword'], df1['keyword'])
import re
df1['new'] = df1.Title.str.findall('|'.join(df2.Keyword.tolist()),flags= re.IGNORECASE).str[0]
df1
   Id                   Title      new
0   1       The house of pump    house
1   2        Where is Andijan  Andijan
2   3               The Joker    Joker
3   4    Good bars in Andijan  Andijan
4   5  What a beautiful house    house