Python 解析HTML并查找多个类和标记-最优雅的方式？_Python_Web Scraping_Beautifulsoup_Data Extraction

Python 解析HTML并查找多个类和标记-最优雅的方式？

python web-scraping

Python 解析HTML并查找多个类和标记-最优雅的方式？,python,web-scraping,beautifulsoup,data-extraction,Python,Web Scraping,Beautifulsoup,Data Extraction,目前我有以下代码： author_name = soup.find(True, {"class":["author", "author-name"]}) if author_name is not None: print author_name.text else: author_name = soup.find(rel="author") if author_name is not None: print

目前我有以下代码：

author_name = soup.find(True, {"class":["author", "author-name"]}) 
    if author_name is not None:
        print author_name.text
    else:
        author_name = soup.find(rel="author")
        if author_name is not None:
            print author_name.text
        else:
            print "No Author Found"

我在找一篇文章的作者。因此，我在类中查找条目，如

class=“author”

，

class=“author name”

，等等。。。或者

rel=author

等等。如果我按我的方式做，它将以许多不同的

If

和

else

语句结束。这对我来说似乎不是很优雅，尽管我最近才开始编码。你们能帮我把这件事做得更优雅些吗？

我会这样做：

results = []
results += soup.select('.author')
results += soup.select('.author-name') 
results += soup.select('[rel=author]')

你可以使用；通过这些选项，可以在一个字符串中指定多个选择条件：

soup.select('.author, .author-name, [rel="author"]')

这将生成一个列表，循环将为您提供一个选项，以找到您最喜欢的列表，或者您可以使用

next（）

函数获取第一个：

for candidate in soup.select('.author, .author-name, [rel="author"]'):
    if candidate.text:
        author = candidate.text
        break
else:
    print "No author found"

soup.select（）

调用将包括按文档顺序匹配的任何元素，因此无论第一个限定元素如何限定，上面的调用都将找到第一个限定元素；如果先在文档中找到

.author name

标记，则它不会优先选择

rel=“author”