Python 当组都在同一元素中时，使用BeautifulSoup将HTML分成组_Python_Html_Parsing_Beautifulsoup

Python 当组都在同一元素中时，使用BeautifulSoup将HTML分成组

python html parsing

Python 当组都在同一元素中时，使用BeautifulSoup将HTML分成组,python,html,parsing,beautifulsoup,Python,Html,Parsing,Beautifulsoup,下面是一个例子： cats they meow they have fur turtles they don't make noises &l

下面是一个例子：

<p class='animal'>cats</p>
<p class='attribute'>they meow</p>
<p class='attribute'>they have fur</p>
<p class='animal'>turtles</p>
<p class='attribute'>they don't make noises</p>
<p class='attribute'>they have shells</p>

如果每个动物都在一个单独的元素中，我就可以遍历这些元素。那太好了。但我试图解析的网站在一个元素中包含所有信息

将汤分成不同的动物，或者以其他方式提取属性和它们属于哪种动物，最好的方法是什么

请随意推荐一个更好的标题

这应该行得通

这应该行。

如果你不需要按顺序保留动物的名字，你可以这样简化杰米的答案

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup("""
<p class='animal'>cats</p>
<p class='attribute'>they meow</p>
<p class='attribute'>they have fur</p>
<p class='animal'>turtles</p>
<p class='attribute'>they don't make noises</p>
<p class='attribute'>they have shells</p>
""")

attributes = {}

for p in soup.findAll('p'):
    if (p['class'] == 'animal'):
        animal = p.string
        attributes[animal] = []
    elif (p['class'] == 'attribute'):
        attributes[animal].append(p.string)

print attributes.keys()
print attributes

如果你不需要保留动物的名字，你可以这样简化杰米的答案

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup("""
<p class='animal'>cats</p>
<p class='attribute'>they meow</p>
<p class='attribute'>they have fur</p>
<p class='animal'>turtles</p>
<p class='attribute'>they don't make noises</p>
<p class='attribute'>they have shells</p>
""")

attributes = {}

for p in soup.findAll('p'):
    if (p['class'] == 'animal'):
        animal = p.string
        attributes[animal] = []
    elif (p['class'] == 'attribute'):
        attributes[animal].append(p.string)

print attributes.keys()
print attributes

这不只是让我得到两种动物元素吗？我不需要任何额外的东西，你只需要做：汤。芬德尔'p'，{'class'：'animal'}。。事实上，你需要做的就是喝汤。芬德尔的“p”，“动物”我想我误解了你的问题-你是想按动物对属性进行分组吗？没错。如果说每只动物都是单独的，那么我只需迭代s，就可以很容易地对它们进行分组。但是，当所有信息都在同一个元素中时，我不确定如何保存与动物相关的所有数据。现在我理解了你的问题，并实际安装了BeautifulSoup-check答案，这不只是两个动物元素吗？我不需要任何额外的东西，你只需要做：汤。芬德尔'p'，{'class'：'animal'}。。事实上，你需要做的就是喝汤。芬德尔的“p”，“动物”我想我误解了你的问题-你是想按动物对属性进行分组吗？没错。如果说每只动物都是单独的，那么我只需迭代s，就可以很容易地对它们进行分组。但是，当所有信息都在同一个元素中时，我不确定如何保存与动物相关的所有数据。现在我理解了您的问题，并实际安装了BeautifulSoup-check答案