如何使用python检查给定列表中的元素是否为文本？_Python

如何使用python检查给定列表中的元素是否为文本？

python

如何使用python检查给定列表中的元素是否为文本？,python,Python,我必须检查给定列表中的一个元素是否在文本中，如果它是一个单词，我可以检查，但如果它包含多个单词，如下面所示，我无法获取 text="what is the price of wheat and White Pepper?" words=['wheat','White Pepper','rice','pepper'] Expected output=['wheat','White Pepper'] 我尝试了以下方法，但没有得到预期的结果，有人能帮我吗 >>> output=

我必须检查给定列表中的一个元素是否在文本中，如果它是一个单词，我可以检查，但如果它包含多个单词，如下面所示，我无法获取

text="what is the price of wheat and White Pepper?"

words=['wheat','White Pepper','rice','pepper']

Expected output=['wheat','White Pepper']

我尝试了以下方法，但没有得到预期的结果，有人能帮我吗

>>> output=[word for word in words if word in text]

>>> print output

>>> ['rice', 'White Pepper', 'wheat']

这里是从“价格”一词中取“大米”一词

如果我使用nltk或任何它将“白胡椒”分为“白胡椒”和“胡椒”

您可以使用正则表达式和单词边界：

import re

text="what is the price of wheat and White Pepper?"

words=['wheat','White Pepper','rice','pepper']

output=[word for word in words if re.search(r"\b{}\b".format(word),text)]

print(output)

结果:

['wheat', 'White Pepper']

您可以通过预构建正则表达式优化搜索（礼节性）：

排序是确保首先获取最长字符串所必需的。Regex转义可能没有必要，因为单词只包含空格和字母。

所以我会这样做

def findWord(list, text):
    words = []
    for i in list:
        index = text.find(i) 
        if index != -1:
            if index != 0 and text[index - 1] != " ":
                continue 
            words.append(i)
    return words

如果字符串不存在，字符串的find函数将返回-1。白胡椒返回31，因为这是它开始的索引

这将为您提供的测试用例返回

['wheat'和'White Pepper']

。

为什么要多次搜索字符串？类似于：

re.findall（r'\b |\b'.join（排序（words，key=len，reverse=True）），text）

将为您提供所需的。。。（请注意，排序是确保最长字符串首先匹配所必需的-您可能希望

re.escape

以文字形式对每个项目进行转义，但我将把它作为练习留给读者）：p这看起来确实不错。我不想抄袭你的答案，你应该自己贴出来。。。你知道我不在乎那种事。。。无论如何，这是你答案的自然延伸。。。随便用吧。嗯。。。想想看<代码>r'\b{}\b'。格式（排序（…）会更好。。否则最长的字符串将不需要从单词边界开始…感谢你们两位的回答…@Jean François Fabre和@Jon ClementsThank you@Austin Stehling…为您提供解决方案

output = re.findall(r'\b|\b'.join(sorted(words, key=len, reverse=True)), text)

def findWord(list, text):
    words = []
    for i in list:
        index = text.find(i) 
        if index != -1:
            if index != 0 and text[index - 1] != " ":
                continue 
            words.append(i)
    return words