Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用lxml和python请求进行解析_Python_Html_Xml_Xpath_Lxml - Fatal编程技术网

使用lxml和python请求进行解析

使用lxml和python请求进行解析,python,html,xml,xpath,lxml,Python,Html,Xml,Xpath,Lxml,最近,我尝试使用lxml和请求从网页解析html表 python代码的运行方式如下: >>> from lxml to html >>> import requests >>> page = requests.get('http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-evening-star-candlestick-pattern') >>

最近,我尝试使用
lxml
请求
从网页解析html表

python代码的运行方式如下:

>>> from lxml to html
>>> import requests
>>> page = requests.get('http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-evening-star-candlestick-pattern')
>>> tree = html.fromstring(page.text)'
然后,我想使用
lxml.xpath()
函数解析以下重复数据块以获取列表:

<TR>
    <TD style="font-size: 11px;"><!-- <a href="/company-technical-details/<%=sr.getExchange()%>/<%=sr.getSymbol()%>/<%=sr.getName()%>" ><%= sr.getSymbol() %></a>  -->
                    AMTEKINDIA           </TD>
    <TD style="font-size: 11px; max-width: 135px;">AMTEK INDIA LIMITED</TD>
    <TD>                nse         </TD>
    <TD style="min-width: 60px; max-width: 60px;">02-01-2015</TD>
    <TD>78</TD>
    <TD>78.3</TD>
    <TD>72.25</TD>
    <TD>73.9</TD>
给出Xpath求值错误和

>>> prices=tree.xpath('//TD/text()')

返回没有值的列表。

您感兴趣的行位于ID为可排序的

from lxml import html

url = 'http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-%20evening-star-candlestick-pattern'
doc = html.parse(url)

# you can use XPath to select elements...
rows = doc.xpath("//table[@id = 'sortable']/tbody/tr")

# or, if you prefer, use CSS selectors instead...
rows = doc.cssselect("table#sortable tbody tr")

for tr in rows:
    # do something with each tr, for example
    tds = tr.cssselect("td")
    print tds[4].text

请注意,您根本不需要
请求
模块。

从lxml到html
是无效的Python。你的意思是从lxml导入html的
?页面似乎声明了一个默认的XHTML名称空间,如果我没有弄错的话,你需要在路径表达式中考虑这个名称空间。与
lxml.html
解析器不同的是,我似乎已经测试了该代码,它可以工作。我明白了-html解析器只是忽略了名称空间。我认为在这种情况下使用HTML解析器是安全的。声称是XHTML的网页中有95%不是。这是一个很好的理由-感谢您解决它和+1。也许在你的答案中加入一个简短的警告会有所帮助?
from lxml import html

url = 'http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-%20evening-star-candlestick-pattern'
doc = html.parse(url)

# you can use XPath to select elements...
rows = doc.xpath("//table[@id = 'sortable']/tbody/tr")

# or, if you prefer, use CSS selectors instead...
rows = doc.cssselect("table#sortable tbody tr")

for tr in rows:
    # do something with each tr, for example
    tds = tr.cssselect("td")
    print tds[4].text