使用Python对照HTML文本检查多个字符串（来自文件）_Python_Arrays_String_Find_Scrapy Spider

使用Python对照HTML文本检查多个字符串（来自文件）

python arrays string

使用Python对照HTML文本检查多个字符串（来自文件）,python,arrays,string,find,scrapy-spider,Python,Arrays,String,Find,Scrapy Spider,我需要在Python中根据文本文件中的多个字符串检查废弃的HTML文档。换句话说，爬行器应该找出html文本是否包含任何给定字符串 url = 'http://forum.unisoftdev.com' request = urllib2.Request(url) response = urllib2.urlopen(request) html = response.read() with open('keywords.txt') as f:

我需要在Python中根据文本文件中的多个字符串检查废弃的HTML文档。换句话说，爬行器应该找出html文本是否包含任何给定字符串

    url = 'http://forum.unisoftdev.com'
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    html = response.read()


    with open('keywords.txt') as f:
        key_words = f.readlines()

    # here's the nut:
    if key_words in html :
        # do something

我不想要任何“elif”和“else”，因为我需要在文本文件中使用它，所以我必须对照多个字符串检查文档，但不知道如何在Python中执行。在PHP中，这真的很容易…

您可以使用带有替换项的正则表达式来检查输入文本中是否存在任何关键字。只需将关键字与连接在一起

pattern = "|".join(r'{}'.format(word) for word in key_words)

如果不希望子字符串匹配，例如按ω匹配ω力，则需要添加：

示例代码：

import re
html = 'I have lots of deltas but no omegas'
key_words = ['alpha', 'omega','delta']
pattern = "|".join(r'{}'.format(word) for word in key_words)
rx = re.compile(pattern)
if rx.search(html):
    # do something
    print "found"

import re
html = 'I have lots of deltas but no omegas'
key_words = ['alpha', 'omega','delta']
pattern = "|".join(r'{}'.format(word) for word in key_words)
rx = re.compile(pattern)
if rx.search(html):
    # do something
    print "found"