Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 带有scrapy和Xpath的空列表_Python_Xpath_Web Scraping_Scrapy - Fatal编程技术网

Python 带有scrapy和Xpath的空列表

Python 带有scrapy和Xpath的空列表,python,xpath,web-scraping,scrapy,Python,Xpath,Web Scraping,Scrapy,我开始使用scrapy和xpath来刮取一些页面,我只是在使用ipython尝试一些简单的事情,我在一些页面(如IMDB)中得到响应,但当我在其他页面(如www.bbb.org)中尝试时,我总是得到一个空列表。这就是我正在做的: scrapy shell 'http://www.bbb.org/central-western-massachusetts/business-reviews/auto-repair-and-service/toms-automotive-in-fitchburg-ma

我开始使用scrapy和xpath来刮取一些页面,我只是在使用ipython尝试一些简单的事情,我在一些页面(如IMDB)中得到响应,但当我在其他页面(如www.bbb.org)中尝试时,我总是得到一个空列表。这就是我正在做的:

scrapy shell 'http://www.bbb.org/central-western-massachusetts/business-reviews/auto-repair-and-service/toms-automotive-in-fitchburg-ma-211787'
BBB认证

自2010年12月2日起成为BBB认证业务

BBB已确定Tom的汽车符合BBB认证标准,其中包括承诺……”

本段的xpath是:

'//*[@id="business-accreditation-content"]/p[2]'
因此,我使用:

data = response.xpath('//*[@id="business-accreditation-content"]/p[2]').extract()

但是
数据
是一个空列表,我使用chrome获取Xpath,它在其他页面中也可以使用,但是在这里,无论我尝试页面的哪个部分,我都没有得到任何结果。

网站实际上检查了
用户代理
标题

如果未指定,请查看返回的内容:

$ scrapy shell 'http://www.bbb.org/central-western-massachusetts/business-reviews/auto-repair-and-service/toms-automotive-in-fitchburg-ma-211787'
In [1]: print(response.body)
Out[1]: 123

In [2]: response.xpath('//*[@id="business-accreditation-content"]/p[2]').extract()
Out[2]: []
是的,没错-如果存在意外的请求用户代理,则响应仅包含
123

现在使用标题(注意指定的
-s
命令行参数):

$scrapy shell'http://www.bbb.org/central-western-massachusetts/business-reviews/auto-repair-and-service/toms-automotive-in-fitchburg-ma-211787'-s USER_AGENT='Mozilla/5.0(Macintosh;英特尔Mac OS X 10_10_2)AppleWebKit/537.36(KHTML,如Gecko)Chrome/46.0.2490.80 Safari/537.36'
[1]:response.xpath('/*[@id=“business Accuration content”]/p[2]')。extract()
出[1]:[u'

BBB已确定Tom的汽车满足要求,其中包括承诺真诚地努力解决任何消费者投诉。BBB认证的企业为认证审查/监控和支持BBB服务向公众支付费用。

']
这是shell中的一个示例。在一个真正的Scrapy项目中,您需要设置。或者,您也可以在以下中间件的帮助下使用用户代理轮换:

$ scrapy shell 'http://www.bbb.org/central-western-massachusetts/business-reviews/auto-repair-and-service/toms-automotive-in-fitchburg-ma-211787' -s USER_AGENT='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36'
In [1]: response.xpath('//*[@id="business-accreditation-content"]/p[2]').extract()
Out[1]: [u'<p itemprop="description">BBB has determined that Tom\'s Automotive meets <a href="http://www.bbb.org/central-western-massachusetts/for-businesses/about-bbb-accreditation/bbb-code-of-business-practices-bbb-accreditation-standards/" lang="LS30TPCERNY5b60c87311af50cf82720b237d8ef866">BBB accreditation standards</a>, which include a commitment to make a good faith effort to resolve any consumer complaints. BBB Accredited Businesses pay a fee for accreditation review/monitoring and for support of BBB services to the public.</p>']