Django 是否可以将多个过滤器与一个BeautifulSoup文档一起使用？_Django_Python 3.x_Performance_Parsing_Beautifulsoup

Django 是否可以将多个过滤器与一个BeautifulSoup文档一起使用？

django python-3.x performance parsing

Django 是否可以将多个过滤器与一个BeautifulSoup文档一起使用？,django,python-3.x,performance,parsing,beautifulsoup,Django,Python 3.x,Performance,Parsing,Beautifulsoup,我正在使用Django和Python 3.7。我想加快我的HTML解析速度。目前，我在文档中寻找三种类型的元素，如下所示 req = urllib2.Request(fullurl, headers=settings.HDR) html = urllib2.urlopen(req).read() comments_soup = BeautifulSoup(html, features="html.parser") score_elts = comments_soup.findAll("div"

我正在使用Django和Python 3.7。我想加快我的HTML解析速度。目前，我在文档中寻找三种类型的元素，如下所示

req = urllib2.Request(fullurl, headers=settings.HDR)
html = urllib2.urlopen(req).read()
comments_soup = BeautifulSoup(html, features="html.parser")

score_elts = comments_soup.findAll("div", {"class": "score"})

comments_elts = comments_soup.findAll("a", attrs={'class': 'comments'})

bad_elts = comments_soup.findAll("span", text=re.compile("low score"))

我已经读到SoupStrainer是提高性能的一种方法--。然而，所有的示例都只讨论使用单个过滤器解析HTML文档。就我而言，我有三个。我如何才能将三个过滤器传递到解析中，或者，如果按照我现在的方式进行解析，会不会产生更差的性能？

我认为不能将多个过滤器传递到BeautifulSoup构造函数中。相反，您可以将所有条件包装到一个过滤器中，并将其传递给BeautifulSoup构造函数

对于简单的情况，例如仅标记名，可以将列表传递到SoupTrainer

html="""
<a>yes</a>
<p>yes</p>
<span>no</span>
"""
from bs4 import BeautifulSoup
from bs4 import SoupStrainer
custom_strainer = SoupStrainer(["a","p"])
soup=BeautifulSoup(html, "lxml", parse_only=custom_strainer)
print(soup)

html=”“”
文档的一节。在上面的代码示例中，您的意思是“parse_only=custom_filter”还是“parse_only=my_filter_function”，因为这就是您命名的函数？@Dave您将函数传递到filter中，然后将filter传递到beautifulsou构造函数中。为了更清晰，我已重命名了该函数。
<a>yes</a><p>yes</p>

html="""
<html class="test">
<a class="wanted">yes</a>
<a class="not-wanted">no</a>
<p>yes</p>
<span>no</span>
</html>
"""
from bs4 import BeautifulSoup
from bs4 import SoupStrainer
def my_function(elem,attrs):
    if elem=='a' and attrs['class']=="wanted":
        return True
    elif elem=='p':
        return True
custom_strainer= SoupStrainer(my_function)
soup=BeautifulSoup(html, "lxml", parse_only=custom_strainer)
print(soup)

<a class="wanted">yes</a><p>yes</p>