Python lxml xpath不返回任何输出_Python_Xpath_Web Scraping_Lxml

Python lxml xpath不返回任何输出

python xpath web-scraping

Python lxml xpath不返回任何输出,python,xpath,web-scraping,lxml,Python,Xpath,Web Scraping,Lxml,我尝试使用Python中的lxml在网站上刮取特定元素。下面您可以找到我的代码，但没有输出 from lxml import html webpage = 'http://www.funda.nl/koop/heel-nederland/' page = requests.get(webpage) tree = html.fromstring(page.content) content = '//*[@id="content"]/form/div[2]

我尝试使用Python中的lxml在网站上刮取特定元素。下面您可以找到我的代码，但没有输出

    from lxml import html

    webpage = 'http://www.funda.nl/koop/heel-nederland/'
    page = requests.get(webpage)
    tree = html.fromstring(page.content)

    content = '//*[@id="content"]/form/div[2]/div[5]/div/a[8]/text()'
    content = str(tree.xpath(content))
    print content

看起来，您试图废弃的网站不喜欢被废弃。他们利用各种技术来检测请求是来自合法用户还是来自bot，如果他们认为请求来自bot，则阻止访问。这就是xpath找不到任何东西的原因，也是您应该重新考虑正在做的事情的原因

如果你决定继续，那么愚弄这个网站最简单的方法就是在你的请求中添加cookie

首先，使用real browser获取cookie字符串：

打开新选项卡

开放式开发工具

转到开发人员工具中的“网络”选项卡

如果“网络”选项卡为空，请刷新页面

查找到荷兰的请求，然后单击它

在请求头中，您会发现cookie字符串——它相当长，包含许多看似随机的字符。抄写

然后，修改您的程序以使用这些cookie：

import requests
from lxml import html

webpage = 'http://www.funda.nl/koop/heel-nederland/'
headers = {
        'Cookie': '<string copied from browser>'
        }
page = requests.get(webpage, headers=headers)
tree = html.fromstring(page.content)

selector = '//*[@id="content"]/form/div[2]/div[5]/div/a[8]/text()'
content = str(tree.xpath(selector))
print content

导入请求
从lxml导入html
网页http://www.funda.nl/koop/heel-nederland/'
标题={
“饼干”：”
}
page=requests.get（网页，标题=headers）
tree=html.fromstring（page.content）
选择器='//*[@id=“content”]/form/div[2]/div[5]/div/a[8]/text（）
content=str（tree.xpath（选择器））
印刷内容