Python和lxml.html按id获取元素输出问题
我目前正在尝试从html文件中获取数据。看起来我正在使用的代码工作正常,但并不像我预期的那样。我可以得到一些项目,但不是全部,我想知道这是否与我试图读取的文件的大小有关 我目前正在尝试解析的源代码 这一页有4500行,所以它的大小相当不错。我一直在使用这个页面,因为我想确保代码在大文件上工作 我使用的代码是:Python和lxml.html按id获取元素输出问题,python,html,html-parsing,lxml,lxml.html,Python,Html,Html Parsing,Lxml,Lxml.html,我目前正在尝试从html文件中获取数据。看起来我正在使用的代码工作正常,但并不像我预期的那样。我可以得到一些项目,但不是全部,我想知道这是否与我试图读取的文件的大小有关 我目前正在尝试解析的源代码 这一页有4500行,所以它的大小相当不错。我一直在使用这个页面,因为我想确保代码在大文件上工作 我使用的代码是: import lxml.html import lxml import urllib2 webHTML = urllib2.urlopen('http://hobbyking.com/h
import lxml.html
import lxml
import urllib2
webHTML = urllib2.urlopen('http://hobbyking.com/hobbyking/store/__39036__Turnigy_Multistar_2213_980Kv_14Pole_Multi_Rotor_Outrunner.html').read()
webHTML = lxml.html.fromstring(webHTML)
productDetails = webHTML.get_element_by_id('productDetails')
for element in productDetails:
print element.text_content()
当我使用'mm3'或接近顶部的某个元素时,这会给出预期的输出,但如果我使用'productDetails'元素时,则不会得到任何输出。至少在我当前的设置中是这样。恐怕
lxml.html
无法解析这个特定的html源代码。它将带有id=“productDetails”
的h3
标记解析为空元素(在a中):
印刷品:
Looking for the ultimate power system for your next Multi-rotor project? Look no further!The Turnigy Multistar outrunners are designed with one thing in mind - maximising Multi-rotor performance! They feature high-end magnets, high quality bearings and all are precision balanced for smooth running, these motors are engineered specifically for multi-rotor use.These include a prop adapter and have a built in aluminium mount for quick and easy installation on your multi-rotor frame.
outrunner
...
非常感谢你的帮助!我会继续尝试使用另一个答案。我没有意识到空元素是默认的恢复模式。我希望我读得更深一点,并且在花几个小时试图自己解决它之前知道这一点@当然可以,谢谢。仅供参考,我提到了
recover
模式只是为了指出lxml.html
默认使用它,没有简单的方法可以告诉它更宽松。我完全理解。我只是没有在文档中看到这一点。这是一个巨大的帮助,因为我经常看到这个空元素,但无法理解它。
from urllib2 import urlopen
from bs4 import BeautifulSoup
url = 'http://hobbyking.com/hobbyking/store/__39036__Turnigy_Multistar_2213_980Kv_14Pole_Multi_Rotor_Outrunner.html'
soup = BeautifulSoup(urlopen(url), 'html5lib')
for element in soup.find(id='productDetails').find_all():
print element.text
Looking for the ultimate power system for your next Multi-rotor project? Look no further!The Turnigy Multistar outrunners are designed with one thing in mind - maximising Multi-rotor performance! They feature high-end magnets, high quality bearings and all are precision balanced for smooth running, these motors are engineered specifically for multi-rotor use.These include a prop adapter and have a built in aluminium mount for quick and easy installation on your multi-rotor frame.
outrunner
...