Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 匹配来自两个不同数据帧的密钥_Python_Pandas_Dataframe_Data Analysis - Fatal编程技术网

Python 匹配来自两个不同数据帧的密钥

Python 匹配来自两个不同数据帧的密钥,python,pandas,dataframe,data-analysis,Python,Pandas,Dataframe,Data Analysis,我有两个数据帧 df1, Name Stage Description key 0 Sri 1 Sri is one of the good singer in this two one 1 NaN 2 Thanks for reading two has 2 Ram 1 Ram is tw

我有两个数据帧

df1,
    Name    Stage   Description                                 key
0   Sri      1      Sri is one of the good singer in this two   one
1   NaN      2      Thanks for reading                          two has
2   Ram      1      Ram is two of the good cricket player       three
3   ganesh   1      one driver                                  four
4   NaN      2      good buddies                                NaN


 df2,
    values
    member of four
    one of three friends
    sri is a cricketer
    Rahul has two brothers
如果键存在于df2.values中,我想用df2值替换df1[“key”]

I tried, df1["key"]=df2[df2["values"].str.contains("|".join(df2["values"].tolist()),na=False)]
但我得到的输出顺序是一样的

我要

    output_df,
        Name    Stage   Description                                 key
0   Sri      1      Sri is one of the good singer in this two   one of three friends
1   NaN      2      Thanks for reading                          Rahul has two brothers
2   Ram      1      Ram is two of the good cricket player       one of three friends
3   ganesh   1      one driver                                  member of four
4   NaN      2      good buddies                                NaN

我将使用集合数组,并使用
你想要匹配
两个has
has two
?是的,我想要这样。df1[key]可以在df2[values]Hmmm中的任何位置,那么所有旧的解决方案都不能使用…:(因为只需要一个单词关键字或相同顺序的多个单词…需要一些模糊匹配。哦,我现在能做的是全新的情况,不幸的是。
df1
df2
的大小是多少?谢谢,对于解决方案,请检查@pirsquared当我在键入我提供的函数
setify
split
方法可以接受一个参数,该参数指定只拆分哪些内容
setify=lambda x:set(x.split(','))
谢谢,我会尝试,你能检查一下吗
setify = lambda x: set(x.split())
v = df2['values'].values.astype(str)
k = df1['key'].values.astype(str)
i = df1.index

# These the sets
a = np.array([setify(x) for x in k.tolist()])
b = np.array([setify(x) for x in v.tolist()])

# This is the broadcasting
matches = (a[:, None] <= b)

# Additional testing that there exist any matches
any_ = matches.any(1)
# Test that wasn't null in the first place
nul_ = df1['key'].notnull().values
mask = any_ & nul_

# And argmax to find where the first set match is.  There
# may be more than one match.  I chose to use `assign`
# therefore I used `mask` to pass a slice of a series
# to target the correct rows.
df1.assign(key1=pd.Series(v[matches.argmax(1)], i)[mask])

     Name  Stage                                Description      key                    key1
0     Sri      1  Sri is one of the good singer in this two      one    one of three friends
1     NaN      2                         Thanks for reading  two has  Rahul has two brothers
2     Ram      1      Ram is two of the good cricket player    three    one of three friends
3  ganesh      1                                 one driver     four          member of four
4     NaN      2                               good buddies      NaN                     NaN