Pandas 将df中的一列映射到所有单词都存在的另一个df_Pandas_Python 2.7_Numpy

Pandas 将df中的一列映射到所有单词都存在的另一个df

pandas python-2.7 numpy

Pandas 将df中的一列映射到所有单词都存在的另一个df,pandas,python-2.7,numpy,Pandas,Python 2.7,Numpy,我试图将一列映射到另一个数据帧中的数据帧，其中所有单词都存在于目标数据帧中多个匹配项都可以，因为我可以在之后过滤掉它们。提前谢谢 df1 ColA this is a sentence with some words in a column and another for fun df2 ColB ColC this a 123 in column 456 fun times 789 一些尝试 dfResult = df1.apply(lambda x:

我试图将一列映射到另一个数据帧中的数据帧，其中所有单词都存在于目标数据帧中

多个匹配项都可以，因为我可以在之后过滤掉它们。提前谢谢

df1
ColA
this is a sentence
with some words
in a column
and another
for fun

df2
ColB        ColC
this a      123
in column   456
fun times   789

一些尝试

dfResult = df1.apply(lambda x: np.all([word in x.df1['ColA'].split(' ') for word in x.df2['ColB'].split(' ')]),axis = 1)

dfResult = df1.ColA.apply(lambda sentence: all(word in sentence for word in df2.ColB))

期望输出


dfResult
ColA                 ColC
this is a sentence   123
with some words      NaN
in a column          456
and another          NaN
for fun              NaN

转到“设置”并使用Numpy广播查找子集免责声明：不保证这会很快

A = df1.ColA.str.split().apply(set).to_numpy()  # If pandas version is < 0.24 use `.values`
B = df2.ColB.str.split().apply(set).to_numpy()  # instead of `.to_numpy()`
C = df2.ColC.to_numpy()

# When `dtype` is `object` Numpy falls back on performing
# the operation on each pair of values.  Since these are `set` objects
# `<=` tests for subset.
i, j = np.where(B <= A[:, None])
out = pd.array([np.nan] * len(A), pd.Int64Dtype())  # Empty nullable integers
# Use `out = np.empty(len(A), dtype=object)` if pandas version is < 0.24
out[i] = C[j]

df1.assign(ColC=out)

                 ColA  ColC
0  this is a sentence   123
1     with some words   NaN
2         in a column   456
3         and another   NaN
4             for fun   NaN

通过使用loop和set.issubset

pd.DataFrame([[y if set(z.split()).issubset(set(x.split())) else np.nan for z,y in zip(df2.ColB,df2.ColC)] for x in df1.ColA ]).max(1)
Out[34]: 
0    123.0
1      NaN
2    456.0
3      NaN
4      NaN
dtype: float64