在Python中使用BeautifulSoup解析数据_Python_Html_Parsing_Beautifulsoup

在Python中使用BeautifulSoup解析数据

python html parsing

在Python中使用BeautifulSoup解析数据,python,html,parsing,beautifulsoup,Python,Html,Parsing,Beautifulsoup,我试图使用BeautifulSoup解析DOM树并提取作者的姓名。下面是一个HTML片段，显示了我将要提取的代码的结构 <html> <body> <div class="list-authors"> <span class="descriptor">Authors:</span> <a href="/find/astro-ph/1/au:+Lin_D/0/1/0/all/0/1">Dacheng Lin</a>

我试图使用BeautifulSoup解析DOM树并提取作者的姓名。下面是一个HTML片段，显示了我将要提取的代码的结构

<html>
<body>
<div class="list-authors">
<span class="descriptor">Authors:</span> 
<a href="/find/astro-ph/1/au:+Lin_D/0/1/0/all/0/1">Dacheng Lin</a>, 
<a href="/find/astro-ph/1/au:+Remillard_R/0/1/0/all/0/1">Ronald A. Remillard</a>, 
<a href="/find/astro-ph/1/au:+Homan_J/0/1/0/all/0/1">Jeroen Homan</a> 
</div>
<div class="list-authors">
<span class="descriptor">Authors:</span> 
<a href="/find/astro-ph/1/au:+Kosovichev_A/0/1/0/all/0/1">A.G. Kosovichev</a>
</div>

<!--There are many other div tags with this structure-->
</body>
</html>

由于

link

已经从一个iterable中获取，所以您不需要对

link

进行子索引，只需执行

link.contents[0]

print link.contents[0]

与您的新示例一起使用两个独立的

生成：

Dacheng Lin Ronald A. Remillard Jeroen Homan A.G. Kosovichev 林大成罗纳德·A·雷米拉德杰伦·霍曼 A.G.科索维切夫

所以我不确定我是否理解关于搜索其他div的评论。如果它们是不同的类，则您需要单独执行

soup.find

和

soup.findAll

，或者只修改您的第一个

soup.find

只需使用findAll作为divs链接即可

对于soup.findAll（'div'，attrs={'class'：'list authors'}）中的authordiv:

如果有更多的div标记，我如何迭代这些标记？如果按CSS类搜索，则会得到元素列表，并且可以使用for循环进行迭代（请参见：）。执行类似的操作：

authordiv=soup.find（'div'，class.='list authors'）

。 Dacheng Lin Ronald A. Remillard Jeroen Homan A.G. Kosovichev