Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python xpath使用lxml解析html时出现问题_Python_Parsing_Xpath_Lxml_Lxml.html - Fatal编程技术网

Python xpath使用lxml解析html时出现问题

Python xpath使用lxml解析html时出现问题,python,parsing,xpath,lxml,lxml.html,Python,Parsing,Xpath,Lxml,Lxml.html,我试图解析来自谷歌互动网站的数据。它是在JS中呈现的,所以我使用Qt加载要从中解析的站点。我相信我已经正确地加载和呈现了站点,但由于某些原因,当我执行xpath解析代码时,我得到了返回给我的空列表 这是我的全部代码: import sys from PyQt4.QtGui import * from PyQt4.QtCore import * from PyQt4.QtWebKit import * from lxml import html class Render(QWe

我试图解析来自谷歌互动网站的数据。它是在JS中呈现的,所以我使用Qt加载要从中解析的站点。我相信我已经正确地加载和呈现了站点,但由于某些原因,当我执行xpath解析代码时,我得到了返回给我的空列表

这是我的全部代码:

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  
from lxml import html 

class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  

  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit() 

url = 'https://www.consumerbarometer.com/en/graph-builder/?question=M1&filter=country:singapore,canada,mexico,brazil,argentina,united_states,bulgaria,austria,belgium,croatia,czech_republic,denmark,estonia,finland,france,germany,greece,hungary,italy,ireland,latvia,lithuania,norway,netherlands,poland,portugal,russia,romania,serbia,slovakia,spain,slovenia,sweden,switzerland,ukraine,united_kingdom,australia,china,israel,hong_kong_sar,japan,korea,new_zealand,malaysia,taiwan,turkey,vietnam'  
#This does the magic.Loads everything
r = Render(url)  
#result is a QString.
result = r.frame.toHtml()

#QString should be converted to string before processed by lxml
formatted_result = str(result.toAscii())

#Next build lxml tree from formatted_result
tree = html.fromstring(formatted_result)

archive_links = tree.xpath('//*[@id="main-page-wrapper"]/div/section/div/section[1]/div/div/graph/div/div[4]/div/div/graph-bar-chart/div[2]/svg/g[1]/g[2]/g[1]/text()')
print archive_links
这是我试图获取的html:
阿根廷


有没有想过为什么我会得到返回给我的
[]

您可以创建一个更短、更可靠的xpath表达式,并且必须使用名称空间:

tree.xpath('//text[@class="bar-text-label"]/text()', namespaces={'n': 'http://www.w3.org/2000/svg'})

替代解决方案可能是使用浏览器自动化软件包:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('https://www.consumerbarometer.com/en/graph-builder/?question=M1&filter=country:singapore,canada,mexico,brazil,argentina,united_states,bulgaria,austria,belgium,croatia,czech_republic,denmark,estonia,finland,france,germany,greece,hungary,italy,ireland,latvia,lithuania,norway,netherlands,poland,portugal,russia,romania,serbia,slovakia,spain,slovenia,sweden,switzerland,ukraine,united_kingdom,australia,china,israel,hong_kong_sar,japan,korea,new_zealand,malaysia,taiwan,turkey,vietnam')

// wait for svg to appear
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.TAG_NAME, 'svg')))

for text in driver.find_elements_by_class_name('bar-text-label'):
    print(text.text)

driver.close()

实际上,我刚刚尝试了上面较短的xpath表达式,即使添加了名称空间,仍然返回一个空列表。@Meepl-hm,我没有尝试使用
pyqt4
,但我已将页面源代码保存到一个html文件中,用
lxml.html
对其进行解析,并使用了提供的xpath-对我来说很有效。不管怎样,你会接受另一种基于“硒”的解决方案吗?谢谢,当然可以。我已经安装了selenium,但我对它很不熟悉。它工作得很好!谢谢,现在我的问题是,我正在尝试获取每个国家的数据值,它具有元素类型:
是否可以使用selenium获取
数据值
属性?我尝试了
查找驱动程序中的文本。按类名称(“bar”)查找元素:打印(data\u value.text)
,但不起作用。我也尝试了此方法,但不起作用:
查找驱动程序中的数据。按xpath('/*[包含(@data value)]/@data value'):打印(data.text)