Python 帮助解析<；预处理>；使用BeautifulSoup的标记_Python_Html Parsing_Beautifulsoup

Python 帮助解析<；预处理>；使用BeautifulSoup的标记

python

Python 帮助解析<；预处理>；使用BeautifulSoup的标记,python,html-parsing,beautifulsoup,Python,Html Parsing,Beautifulsoup,我正试图使用BeautifulSoup和python解析网站上的信息。html如下所示。我希望我解析的数据看起来像： ID定义赖氨酸生物合成-普氏伯克霍尔德菌17 ... 其余数据位于类似位置（在“pre”标记内和“a”标记外）我该怎么做 <pre>ID Definition ---------------------------------------------------------------------------------

我正试图使用BeautifulSoup和python解析网站上的信息。html如下所示。我希望我解析的数据看起来像：

ID定义
赖氨酸生物合成-普氏伯克霍尔德菌17
... 其余数据位于类似位置（在“pre”标记内和“a”标记外）

我该怎么做

<pre>ID                   Definition
    ----------------------------------------------------------------------------------------------------
<a href="/kegg-bin/show_pathway?bpm00300">bpm00300</a>             Lysine biosynthesis - Burkholderia pseudomallei 17 
<a href="/kegg-bin/show_pathway?bpm00330">bpm00330</a>             Arginine and proline metabolism - Burkholderia pse 
<a href="/kegg-bin/show_pathway?bpm01100">bpm01100</a>             Metabolic pathways - Burkholderia pseudomallei 171 
<a href="/kegg-bin/show_pathway?bpm01110">bpm01110</a>             Biosynthesis of secondary metabolites - Burkholder 
</pre>

感谢您的帮助！

BeautifulSoup（）及其搜索方法，而不仅仅是字符串。在找到的节点上迭代findChildren（）可以满足您的需要（并跳过标题行）：

y=soup.find('pre') #returns data between <pre> tags. Specific to KEGG
    for a in y:
        z =a.string

 ID                   Definition
----------------------------------------------------------------------------------------------------

for a in soup.find('pre').findChildren():
    z = a.string