Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-检查字符串是否包含列表中的任何元素_Python_Regex - Fatal编程技术网

Python-检查字符串是否包含列表中的任何元素

Python-检查字符串是否包含列表中的任何元素,python,regex,Python,Regex,我需要检查字符串是否包含列表中的任何元素。我目前正在使用此方法: engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"] engSentence = "the dogs fur is black and white" print("the english sentence is: " + engSentence) engWords2 = [] isEnglish = 0 for w in

我需要检查字符串是否包含列表中的任何元素。我目前正在使用此方法:

engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"

print("the english sentence is: " + engSentence)

engWords2 = []
isEnglish = 0

for w in engWords:
    if w in engSentence:
        isEnglish = 1
        engWords2.append(w)

if isEnglish == 1:
    print("The sentence is english and contains the words: ")
    print(engWords2)
这样做的问题是,它会给出以下输出:

the english sentence is: the dogs fur is black and white
The sentence is english and contains the words: 
['the', 'a', 'and', 'it']
>>> 

正如你所看到的,“a”和“it”不应该出现。如何搜索,使其只列出单个单词,而不列出单词的一部分?我愿意接受任何使用普通python代码或正则表达式的想法(尽管我对python和正则表达式都很陌生,所以请不要太复杂),谢谢。

找到这两个词是因为它们分别是“黑色”和“白色”的子串。当您将“in”应用于字符串时,它只查找字符的子字符串

尝试:

后来呢,

if w in engSentenceWords:

这会将原始句子拆分为单个单词的列表,然后对照整个单词值进行检查。

它会查找这两个单词,因为它们分别是“黑色”和“白色”的子字符串。当您将“in”应用于字符串时,它只查找字符的子字符串

words = set(engSentence.split()).intersection(set(engWords))
if words:
    print("The sentence is english and contains the words: ")
    print(words)
尝试:

后来呢,

if w in engSentenceWords:
这将原始句子拆分为单个单词的列表,然后对照整个单词值进行检查

words = set(engSentence.split()).intersection(set(engWords))
if words:
    print("The sentence is english and contains the words: ")
    print(words)
将英语句子拆分为列表中的标记,将其转换为集合,将英语单词转换为集合,然后找到交叉点(公共重叠)。然后检查这是否为非空,如果是,则打印出找到的单词


将英语句子拆分为列表中的标记,将其转换为集合,将英语单词转换为集合,然后找到交叉点(公共重叠)。然后检查是否为非空,如果为空,请打印出找到的单词。

或者更简单,在句子和搜索词中添加空格:

engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"

print("the english sentence is: " + engSentence)

engWords2 = []
isEnglish = 0
engSentence += " "

for w in engWords:
    if "%s " % w in engSentence:
        isEnglish = 1
        engWords2.append(w)

if isEnglish == 1:
    print("The sentence is english and contains the words: ")
    print(engWords2)
输出为:

the english sentence is: the dogs fur is black and white
The sentence is english and contains the words: 
['the', 'and']

或者更简单,在句子和搜索词中添加空格:

engWords = ["the", "a", "and", "of", "be", "that", "have", "it", "for", "not"]
engSentence = "the dogs fur is black and white"

print("the english sentence is: " + engSentence)

engWords2 = []
isEnglish = 0
engSentence += " "

for w in engWords:
    if "%s " % w in engSentence:
        isEnglish = 1
        engWords2.append(w)

if isEnglish == 1:
    print("The sentence is english and contains the words: ")
    print(engWords2)
输出为:

the english sentence is: the dogs fur is black and white
The sentence is english and contains the words: 
['the', 'and']

您可能需要使用正则表达式匹配。试试下面的方法

import re

match_list = ['foo', 'bar', 'eggs', 'lamp', 'owls']
match_str = 'owls are not what they seem'
match_regex = re.compile('^.*({1}).*$'.format('|'.join(match_list)))

if match_regex.match(match_str):
    print('We have a match.')

有关详细信息,请参阅上的
re
文档。

您可能希望使用正则表达式匹配。试试下面的方法

import re

match_list = ['foo', 'bar', 'eggs', 'lamp', 'owls']
match_str = 'owls are not what they seem'
match_regex = re.compile('^.*({1}).*$'.format('|'.join(match_list)))

if match_regex.match(match_str):
    print('We have a match.')

有关详细信息,请参阅上的
re
文档。

这里没有涉及任何正则表达式——这只是字符串操作。正则表达式是针对字符串提供匹配模式的一种非常特殊的方法,如果您正在使用它们,您将使用
re
模块。顺便说一句,值得注意的是,所有这些解决方案(包括我的)只有在没有标点的情况下才有效。任何标点符号都会看起来像它旁边的单词的一部分,并使您的比较失败。如果你开始使用标点符号,你需要一些策略来删除或忽略它。一种策略是对完整的句子字符串使用正则表达式,在搜索的每个单词的两侧都有一个“\b”。这里不涉及任何正则表达式——这只是字符串操作。正则表达式是针对字符串提供匹配模式的一种非常特殊的方法,如果您正在使用它们,您将使用
re
模块。顺便说一句,值得注意的是,所有这些解决方案(包括我的)只有在没有标点的情况下才有效。任何标点符号都会看起来像它旁边的单词的一部分,并使您的比较失败。如果你开始使用标点符号,你需要一些策略来删除或忽略它。一种策略是对完整的句子字符串使用正则表达式,在搜索的每个单词的两侧都有一个“\b”。