Python-检查字符串是否包含列表中的任何元素_Python_Regex

Python-检查字符串是否包含列表中的任何元素

python regex

Python-检查字符串是否包含列表中的任何元素,python,regex,Python,Regex,我需要检查字符串是否包含列表中的任何元素。我目前正在使用此方法： engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"] engSentence = "the dogs fur is black and white" print("the english sentence is: " + engSentence) engWords2 = [] isEnglish = 0 for w in

我需要检查字符串是否包含列表中的任何元素。我目前正在使用此方法：

engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"

print("the english sentence is: " + engSentence)

engWords2 = []
isEnglish = 0

for w in engWords:
    if w in engSentence:
        isEnglish = 1
        engWords2.append(w)

if isEnglish == 1:
    print("The sentence is english and contains the words: ")
    print(engWords2)

这样做的问题是，它会给出以下输出：

the english sentence is: the dogs fur is black and white
The sentence is english and contains the words: 
['the', 'a', 'and', 'it']
>>>

正如你所看到的，“a”和“it”不应该出现。如何搜索，使其只列出单个单词，而不列出单词的一部分？我愿意接受任何使用普通python代码或正则表达式的想法（尽管我对python和正则表达式都很陌生，所以请不要太复杂），谢谢。

找到这两个词是因为它们分别是“黑色”和“白色”的子串。当您将“in”应用于字符串时，它只查找字符的子字符串

尝试：

后来呢,

if w in engSentenceWords:

这会将原始句子拆分为单个单词的列表，然后对照整个单词值进行检查。

它会查找这两个单词，因为它们分别是“黑色”和“白色”的子字符串。当您将“in”应用于字符串时，它只查找字符的子字符串

words = set(engSentence.split()).intersection(set(engWords))
if words:
    print("The sentence is english and contains the words: ")
    print(words)

尝试：

后来呢,

if w in engSentenceWords:

这将原始句子拆分为单个单词的列表，然后对照整个单词值进行检查

words = set(engSentence.split()).intersection(set(engWords))
if words:
    print("The sentence is english and contains the words: ")
    print(words)

将英语句子拆分为列表中的标记，将其转换为集合，将英语单词转换为集合，然后找到交叉点（公共重叠）。然后检查这是否为非空，如果是，则打印出找到的单词

将英语句子拆分为列表中的标记，将其转换为集合，将英语单词转换为集合，然后找到交叉点（公共重叠）。然后检查是否为非空，如果为空，请打印出找到的单词。

或者更简单，在句子和搜索词中添加空格：

engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"

print("the english sentence is: " + engSentence)

engWords2 = []
isEnglish = 0
engSentence += " "

for w in engWords:
    if "%s " % w in engSentence:
        isEnglish = 1
        engWords2.append(w)

if isEnglish == 1:
    print("The sentence is english and contains the words: ")
    print(engWords2)

输出为：

the english sentence is: the dogs fur is black and white
The sentence is english and contains the words: 
['the', 'and']

或者更简单，在句子和搜索词中添加空格：

engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"

print("the english sentence is: " + engSentence)

engWords2 = []
isEnglish = 0
engSentence += " "

for w in engWords:
    if "%s " % w in engSentence:
        isEnglish = 1
        engWords2.append(w)

if isEnglish == 1:
    print("The sentence is english and contains the words: ")
    print(engWords2)

输出为：

the english sentence is: the dogs fur is black and white
The sentence is english and contains the words: 
['the', 'and']

您可能需要使用正则表达式匹配。试试下面的方法

import re

match_list = ['foo', 'bar', 'eggs', 'lamp', 'owls']
match_str = 'owls are not what they seem'
match_regex = re.compile('^.*({1}).*$'.format('|'.join(match_list)))

if match_regex.match(match_str):
    print('We have a match.')

有关详细信息，请参阅上的

re

文档。

您可能希望使用正则表达式匹配。试试下面的方法

import re

match_list = ['foo', 'bar', 'eggs', 'lamp', 'owls']
match_str = 'owls are not what they seem'
match_regex = re.compile('^.*({1}).*$'.format('|'.join(match_list)))

if match_regex.match(match_str):
    print('We have a match.')

有关详细信息，请参阅上的

re

文档。

这里没有涉及任何正则表达式——这只是字符串操作。正则表达式是针对字符串提供匹配模式的一种非常特殊的方法，如果您正在使用它们，您将使用

re

模块。顺便说一句，值得注意的是，所有这些解决方案（包括我的）只有在没有标点的情况下才有效。任何标点符号都会看起来像它旁边的单词的一部分，并使您的比较失败。如果你开始使用标点符号，你需要一些策略来删除或忽略它。一种策略是对完整的句子字符串使用正则表达式，在搜索的每个单词的两侧都有一个“\b”。这里不涉及任何正则表达式——这只是字符串操作。正则表达式是针对字符串提供匹配模式的一种非常特殊的方法，如果您正在使用它们，您将使用

re

模块。顺便说一句，值得注意的是，所有这些解决方案（包括我的）只有在没有标点的情况下才有效。任何标点符号都会看起来像它旁边的单词的一部分，并使您的比较失败。如果你开始使用标点符号，你需要一些策略来删除或忽略它。一种策略是对完整的句子字符串使用正则表达式，在搜索的每个单词的两侧都有一个“\b”。