Python3获取子元素(lxml)
我将lxml与html一起使用:Python3获取子元素(lxml),python,html,python-requests,Python,Html,Python Requests,我将lxml与html一起使用: from lxml import html import requests 如何检查元素的任何子元素是否具有class=“nearly” 我的代码(基本上): 如何替换“HasChildWithClass()”使其实际工作 下面是一个示例树: ... <p class="result-info"> <span class="result-meta"> <span class="nea
from lxml import html
import requests
如何检查元素的任何子元素是否具有class=“nearly”
我的代码(基本上):
如何替换“HasChildWithClass()”使其实际工作
下面是一个示例树:
...
<p class="result-info">
<span class="result-meta">
<span class="nearby">
... #this SHOULD print something
</span>
</span>
</p>
<p class="result-info">
<span class="result-meta">
<span class="FAR-AWAY">
... # this should NOT print anything
</span>
</span>
</p>
...
。。。
... #这应该会打印一些东西
... # 这不应该打印任何内容
...
这是我做的一个实验
在python shell中获取r=resultList[0]
,然后键入:
>>> dir(r)
['__bool__', '__class__', ..., 'find_class', ...
现在这个find_class
方法非常可疑。如果您查看其帮助文档:
>>> help(r.find_class)
你会证实这个猜测的。的确
>>> r.find_class('nearby')
[<Element span at 0x109788ea8>]
现在,如何判断“附近”的孩子是否存在已经很清楚了
干杯 我试图理解为什么要使用
lxml
来查找元素。但是,BeautifulSoup
和re
可能是更好的选择
lxml = """
<p class="result-info">
<span class="result-meta">
<span class="nearby">
... #this SHOULD print something
</span>
</span>
</p>
<p class="result-info">
<span class="result-meta">
<span class="FAR-AWAY">
... # this should NOT print anything
</span>
</span>
</p>
"""
尝试使用bs4
from bs4 import BeautifulSoup
soup = BeautifulSoup(lxml,"lxml")
result = soup.find_all("span", class_="nearby")
print(result[0].text)
lxml = """
<p class="result-info">
<span class="result-meta">
<span class="nearby">
... #this SHOULD print something
</span>
</span>
</p>
<p class="result-info">
<span class="result-meta">
<span class="FAR-AWAY">
... # this should NOT print anything
</span>
</span>
</p>
"""
from lxml import html
Tree = html.fromstring(lxml)
resultList = Tree.xpath('//p[@class="result-info"]')
i = len(resultList) - 1 #to go though the list backwards
for result in resultList:
for e in result.iter():
if e.attrib.get("class") == "nearby":
print(e.text)
from bs4 import BeautifulSoup
soup = BeautifulSoup(lxml,"lxml")
result = soup.find_all("span", class_="nearby")
print(result[0].text)