Python 在另一个数据帧行中查找数据帧行中的单词_Python_Pandas_Dataframe

Python 在另一个数据帧行中查找数据帧行中的单词

python pandas dataframe

Python 在另一个数据帧行中查找数据帧行中的单词,python,pandas,dataframe,Python,Pandas,Dataframe,我想检查数据框B行中的单词是否存在于另一个数据框a行中，并检索数据框a的行号数据帧A的示例 LineNumber Description 2539 5401845 Either the well was very deep, or she fell very slowly, 4546 5409117 for she had plenty of time as she went down to look about her, 4368 5408

我想检查数据框B行中的单词是否存在于另一个数据框a行中，并检索数据框a的行号

数据帧A的示例

      LineNumber               Description
2539  5401845  Either the well was very deep, or she fell very slowly,
4546  5409117  for she had plenty of time as she went down to look about her, 
4368  5408517  and to wonder what was going to happen next

数据帧B的示例

                 Words
50062   well deep fell
44263   plenty time above
4731    plenty time down look

我现在想知道数据帧B的每一行中的所有单词是否都在数据帧A的任何一行中。如果是这样，我将从数据帧A检索行号并将其分配给数据帧B

输出应该是这样的

                     Words             LineNumber
50062   well deep fell                 5401845
44263   plenty time above
4731    plenty time down look          5409117

我试过这样的东西，但不起作用

a = 'for she had plenty of time as she went down to look about her,'
str = 'plenty time down look'
if all(x in str for x in a):
    print(True)
else:
    print(False)

谢谢

您已接近您要做的事情。试着这样做：

a = 'for she had plenty of time as she went down to look about her,'
string = 'plenty time down look'
a = a.split(' ')
string = string.split(' ')
if all(x in a for x in string):
    print(True)
else:
    print(False)

最初在a中x的字符串中使用

x的方式有两个问题。第一个是string
和a
中的每个元素都是字符，因此要比较单词，需要创建一个单词列表，这就是我包含拆分的原因
第二个是，如果a
中的每个元素都在string
中，则逻辑x in string for x in a
表示返回True，但您需要的是x in a for x in string
如果string
中的每个元素都在a
中，则返回True

制作数据帧
通过索引将数据帧y中的描述与数据帧x匹配，并从数据帧x中获取匹配的索引
查看这篇文章：谢谢，迭代x数据帧更有效
x = pd.DataFrame({"Description": ["for she had plenty of time as she went down to look about her",
                                  "for she had of time as she went down to look about her"]})

>>> x
    Description
0   for she had plenty of time as she went down to look about her
1   for she had of time as she went down to look about her

y = pd.DataFrame({"Description": ["plenty time down look"]})
>>> y
    Description
0   plenty time down look

with_words = y["Description"].iloc[[0]].item().split()
with_regex = "".join(['(?=.*{})'.format(word) for word in with_words])

>>> with_regex
'(?=.*plenty)(?=.*time)(?=.*down)(?=.*look)'

>>> x.loc[(x.Description.str.contains(with_regex))].index.item()
0