Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python对照HTML文本检查多个字符串(来自文件)_Python_Arrays_String_Find_Scrapy Spider - Fatal编程技术网

使用Python对照HTML文本检查多个字符串(来自文件)

使用Python对照HTML文本检查多个字符串(来自文件),python,arrays,string,find,scrapy-spider,Python,Arrays,String,Find,Scrapy Spider,我需要在Python中根据文本文件中的多个字符串检查废弃的HTML文档。换句话说,爬行器应该找出html文本是否包含任何给定字符串 url = 'http://forum.unisoftdev.com' request = urllib2.Request(url) response = urllib2.urlopen(request) html = response.read() with open('keywords.txt') as f:

我需要在Python中根据文本文件中的多个字符串检查废弃的HTML文档。换句话说,爬行器应该找出html文本是否包含任何给定字符串

    url = 'http://forum.unisoftdev.com'
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    html = response.read()


    with open('keywords.txt') as f:
        key_words = f.readlines()

    # here's the nut:
    if key_words in html :
        # do something

我不想要任何“elif”和“else”,因为我需要在文本文件中使用它,所以我必须对照多个字符串检查文档,但不知道如何在Python中执行。在PHP中,这真的很容易…

您可以使用带有替换项的正则表达式来检查输入文本中是否存在任何关键字。只需将关键字与连接在一起

pattern = "|".join(r'{}'.format(word) for word in key_words)
如果不希望子字符串匹配,例如按ω匹配ω力,则需要添加:

示例代码:

import re
html = 'I have lots of deltas but no omegas'
key_words = ['alpha', 'omega','delta']
pattern = "|".join(r'{}'.format(word) for word in key_words)
rx = re.compile(pattern)
if rx.search(html):
    # do something
    print "found"
import re
html = 'I have lots of deltas but no omegas'
key_words = ['alpha', 'omega','delta']
pattern = "|".join(r'{}'.format(word) for word in key_words)
rx = re.compile(pattern)
if rx.search(html):
    # do something
    print "found"