Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python和lxml.html按id获取元素输出问题_Python_Html_Html Parsing_Lxml_Lxml.html - Fatal编程技术网

Python和lxml.html按id获取元素输出问题

Python和lxml.html按id获取元素输出问题,python,html,html-parsing,lxml,lxml.html,Python,Html,Html Parsing,Lxml,Lxml.html,我目前正在尝试从html文件中获取数据。看起来我正在使用的代码工作正常,但并不像我预期的那样。我可以得到一些项目,但不是全部,我想知道这是否与我试图读取的文件的大小有关 我目前正在尝试解析的源代码 这一页有4500行,所以它的大小相当不错。我一直在使用这个页面,因为我想确保代码在大文件上工作 我使用的代码是: import lxml.html import lxml import urllib2 webHTML = urllib2.urlopen('http://hobbyking.com/h

我目前正在尝试从html文件中获取数据。看起来我正在使用的代码工作正常,但并不像我预期的那样。我可以得到一些项目,但不是全部,我想知道这是否与我试图读取的文件的大小有关

我目前正在尝试解析的源代码

这一页有4500行,所以它的大小相当不错。我一直在使用这个页面,因为我想确保代码在大文件上工作

我使用的代码是:

import lxml.html
import lxml
import urllib2

webHTML = urllib2.urlopen('http://hobbyking.com/hobbyking/store/__39036__Turnigy_Multistar_2213_980Kv_14Pole_Multi_Rotor_Outrunner.html').read()
webHTML = lxml.html.fromstring(webHTML)
productDetails = webHTML.get_element_by_id('productDetails')
for element in productDetails:
    print element.text_content()

当我使用'mm3'或接近顶部的某个元素时,这会给出预期的输出,但如果我使用'productDetails'元素时,则不会得到任何输出。至少在我当前的设置中是这样。

恐怕
lxml.html
无法解析这个特定的html源代码。它将带有
id=“productDetails”
h3
标记解析为空元素(在a中):

印刷品:

Looking for the ultimate power system for your next Multi-rotor project? Look no further!The Turnigy Multistar outrunners are designed with one thing in mind - maximising Multi-rotor performance! They feature high-end magnets, high quality bearings and all are precision balanced for smooth running, these motors are engineered specifically for multi-rotor use.These include a prop adapter and have a built in aluminium mount for quick and easy installation on your multi-rotor frame.

outrunner

...

非常感谢你的帮助!我会继续尝试使用另一个答案。我没有意识到空元素是默认的恢复模式。我希望我读得更深一点,并且在花几个小时试图自己解决它之前知道这一点@当然可以,谢谢。仅供参考,我提到了
recover
模式只是为了指出
lxml.html
默认使用它,没有简单的方法可以告诉它更宽松。我完全理解。我只是没有在文档中看到这一点。这是一个巨大的帮助,因为我经常看到这个空元素,但无法理解它。
from urllib2 import urlopen
from bs4 import BeautifulSoup

url = 'http://hobbyking.com/hobbyking/store/__39036__Turnigy_Multistar_2213_980Kv_14Pole_Multi_Rotor_Outrunner.html'
soup = BeautifulSoup(urlopen(url), 'html5lib')

for element in soup.find(id='productDetails').find_all():
    print element.text
Looking for the ultimate power system for your next Multi-rotor project? Look no further!The Turnigy Multistar outrunners are designed with one thing in mind - maximising Multi-rotor performance! They feature high-end magnets, high quality bearings and all are precision balanced for smooth running, these motors are engineered specifically for multi-rotor use.These include a prop adapter and have a built in aluminium mount for quick and easy installation on your multi-rotor frame.

outrunner

...