Python 如何获得汤的价值。选择？_Python_Beautifulsoup

Python 如何获得汤的价值。选择？

python

Python 如何获得汤的价值。选择？,python,beautifulsoup,Python,Beautifulsoup,如何获取a标记（谷歌）的值返回整个a标记，我只需要值。此外，页面上可能有多个H2s。如何筛选类为“hello word”的用户？请尝试以下操作： >>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('<h2 class="hello-word"><a href="http://www.google.com">Google</a></h2>

如何获取

标记（谷歌）的值

返回整个a标记，我只需要值。此外，页面上可能有多个H2s。如何筛选类为“hello word”的用户？

请尝试以下操作：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<h2 class="hello-word"><a     href="http://www.google.com">Google</a></h2>', 'html.parser')
>>> soup.text
'Google'

您可以在CSS选择器中的

h2

上使用

.hello-word

，仅选择带有类

hello-word

的

h2

标记，然后选择其子

。另外

soup.select（）

返回所有可能匹配项的列表，因此您可以轻松地对其进行迭代并调用每个元素

.text

以获取文本。范例-

for i in soup.select("h2.hello-word > a"):
    print(i.text)

示例/演示（我添加了一些我自己的元素，其中一个有一个稍微不同的类来显示选择器的工作）-

>>来自bs4导入组
>>>s=”“”
... 
... """
>>>soup=BeautifulSoup（s，'html.parser'）
>>>对于汤中的i，选择（“h2.hello-word>a”）：
...     打印（i.text）
...
谷歌
谷歌12

真管用！非常感谢。对于汤中的c。选择（“h2.hello-word>a”）：callSign=c.get_text（'href'）为什么不起作用？因为

href

不是文本，它是元素

上的一个属性，请使用

c['href']

。

    >>> import lxml.html
    >>> from lxml.cssselect import CSSSelector
    >>> txt = '<h2 class="hello-word"><a href="http://www.google.com">Google</a></h2>'
    >>> tree = lxml.html.fromstring(txt)
    >>> sel = CSSSelector('h2 > a')
    >>> element = sel(tree)[0]
    >>> element.text
    Google

for i in soup.select("h2.hello-word > a"):
    print(i.text)

>>> from bs4 import BeautifulSoup
>>> s = """<h2 class="hello-word"><a href="http://www.google.com">Google</a></h2>
... <h2 class="hello-word"><a href="http://www.google.com">Google12</a></h2>
... <h2 class="hello-word2"><a href="http://www.google.com">Google13</a></h2>"""

>>> soup = BeautifulSoup(s,'html.parser')

>>> for i in soup.select("h2.hello-word > a"):
...     print(i.text)
...
Google
Google12