使用lxml和python浏览谷歌新闻

使用lxml和python浏览谷歌新闻,python,dom,web-crawler,lxml,scraper,Python,Dom,Web Crawler,Lxml,Scraper,我正在尝试使用python和lxml抓取Google新闻。一切都进行得很顺利,但当我试图使用for循环打印每个div数据时,一切都搞砸了。 这是我的代码: # -*- coding: utf-8 -*- from stem import Signal from stem.control import Controller from lxml import html from lxml import cssselect from lxml import etree import requests

我正在尝试使用python和lxml抓取Google新闻。一切都进行得很顺利,但当我试图使用for循环打印每个div数据时,一切都搞砸了。 这是我的代码:

# -*- coding: utf-8 -*-

from stem import Signal
from stem.control import Controller
from lxml import html
from lxml import cssselect
from lxml import etree
import requests

proxies = {
    'http' : 'http://127.0.0.1:8123'
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}

url = "https://www.google.it/search?hl=en&tbm=nws&as_occt=any&tbs=cdr:1,cd_min:9/1/2014,cd_max:9/1/2014,sbd:1&as_nsrc=Daily%20Mail&start=0"

page = requests.get(url,proxies=proxies,headers=headers)
tree = html.fromstring(page.content)
results = tree.xpath('//div[@class="_cnc"]')

for div in results:
    print(div)
我得到这个输出:

<Element div at 0x7f4154df9470>
<Element div at 0x7f4154df94c8>
<Element div at 0x7f4154df9520>
<Element div at 0x7f4154df9578>
<Element div at 0x7f4154df95d0>
<Element div at 0x7f4154df9628>
<Element div at 0x7f4154df9680>
<Element div at 0x7f4154df96d8>
<Element div at 0x7f4154df9730>
<Element div at 0x7f4154df9788>
当我尝试打印时,我得到相同的多重输出:

['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']
['Pro-Russian rebels lower demands in peace talks', "'If I want to I can take Kiev in a fortnight': Putin's threat to Europe ", 'Showing a yen for business: PM Modi and Japanese premier Abe ', 'Modi visit draws pledges of support from Japan', 'The spectre of the Army rises again over Pakistan', 'Protesters briefly storm Pakistan state TV station', 'Anti-government protesters storm Pakistan state television station ', 'Austerity debate flares as Europe recovery fades', 'Inquiries begin into nude celebrity photo leaks', 'He tried to sell intimate pictures of Jennifer Lawrence in return for ']

有人知道我的代码出了什么问题吗?

你几乎做到了-只需在内部XPath表达式前面加上点,使它们特定于当前节点的上下文:

for div in results:
    title = div.xpath('.//a[@class="l _HId"]/text()')
    href = div.xpath('.//a[@class="l _HId"]/@href')
    snippet = div.xpath('.//div[@class="st"]/text()')
    #for example
    print(title)
for div in results:
    title = div.xpath('.//a[@class="l _HId"]/text()')
    href = div.xpath('.//a[@class="l _HId"]/@href')
    snippet = div.xpath('.//div[@class="st"]/text()')
    #for example
    print(title)