Scrapy-从filterbox(Python)获取数据
我对Scrapy有意见。我需要从我下面链接的图片中红色圆圈部分获取所有城市名称。但是用我的代码我不能返回任何东西。我尝试了许多选择,但没有成功。如何解决这个问题并获得这些城市名称?下面是图像和源代码的链接Scrapy-从filterbox(Python)获取数据,python,scrapy,web-crawler,scrapy-spider,Python,Scrapy,Web Crawler,Scrapy Spider,我对Scrapy有意见。我需要从我下面链接的图片中红色圆圈部分获取所有城市名称。但是用我的代码我不能返回任何东西。我尝试了许多选择,但没有成功。如何解决这个问题并获得这些城市名称?下面是图像和源代码的链接 import scrapy from scrapy.spiders import CrawlSpider #from city_crawl.items import CityCrawlItem class details(CrawlSpider): name = "city_cra
import scrapy
from scrapy.spiders import CrawlSpider
#from city_crawl.items import CityCrawlItem
class details(CrawlSpider):
name = "city_crawling"
start_urls = ['https://www.booking.com/searchresults.tr.html?label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaOQBiAEBmAEowgEKd2luZG93cyAxMMgBDNgBAegBAfgBC5ICAXmoAgM&sid=cfc09bd0db4d07c7b55902c6d0ae81a5&track_lsso=1&sb=1&src=index&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.tr.html%3Flabel%3Dgen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaOQBiAEBmAEowgEKd2luZG93cyAxMMgBDNgBAegBAfgBC5ICAXmoAgM%3Bsid%3Dcfc09bd0db4d07c7b55902c6d0ae81a5%3Bsb_price_type%3Dtotal%26%3B&ss=isve%C3%A7&checkin_monthday=&checkin_month=&checkin_year=&checkout_monthday=&checkout_month=&checkout_year=&room1=A%2CA&no_rooms=1&group_adults=2&group_children=0']
def parse(self, response):
for content in response.xpath('//a[contains(@data-name, "uf")]'):
yield {
'text': content.css('span.filter_label::text').extract()
}
您的
for
循环是选择
元素和类
包含“uf
”,它将不返回任何内容。如果选择包含“uf
”的data name
元素,您可以如下更改代码:
for content in response.xpath('//a[contains(@data-name, "uf")]'):
yield {
'text': content.css('span.filter_label::text').extract()
}
更新:
我已经测试了你的url链接,你是对的,它不会返回任何内容。根本原因是scrapy重定向了三次,最后转到了错误的页面,它在错误的页面上乱写“https://www.booking.com/country/se.tr.html
”,并且此页面与图像中显示的页面不同。日志如下:
2017-04-30 15:18:47 [scrapy] DEBUG: Redirecting (301) to <GET https://www.bookin
g.com/searchresults.tr.html?ss=isve%25C3%25A7> from <GET https://www.booking.com
/searchresults.tr.html?label=gen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaOQBiAEBmAEowgEK
d2luZG93cyAxMMgBDNgBAegBAfgBC5ICAXmoAgM&sid=cfc09bd0db4d07c7b55902c6d0ae81a5&tra
ck_lsso=1&sb=1&src=index&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Fi
ndex.tr.html%3Flabel%3Dgen173nr-1FCAEoggJCAlhYSDNiBW5vcmVmaOQBiAEBmAEowgEKd2luZG
93cyAxMMgBDNgBAegBAfgBC5ICAXmoAgM%3Bsid%3Dcfc09bd0db4d07c7b55902c6d0ae81a5%3Bsb_
price_type%3Dtotal%26%3B&ss=isve%C3%A7&checkin_monthday=&checkin_month=&checkin_
year=&checkout_monthday=&checkout_month=&checkout_year=&room1=A%2CA&no_rooms=1&g
roup_adults=2&group_children=0>
2017-04-30 15:18:48 [scrapy] DEBUG: Redirecting (301) to <GET https://www.bookin
g.com/searchresults.tr.html?ss=isve%C3%A7> from <GET https://www.booking.com/sea
rchresults.tr.html?ss=isve%25C3%25A7>
2017-04-30 15:18:48 [scrapy] DEBUG: Redirecting (302) to <GET https://www.bookin
g.com/country/se.tr.html> from <GET https://www.booking.com/searchresults.tr.htm
l?ss=isve%C3%A7>
2017-04-30 15:18:49 [scrapy] DEBUG: Crawled (200) <GET https://www.booking.com/c
ountry/se.tr.html> (referer: None)
2017-04-30 15:18:49 [scrapy] INFO: Closing spider (finished)
在您的scrapy项目中运行scrawl命令:scrapy crawl city\u crawling
,它将为您提供开始scrawl所需信息的功能,检查以下日志并输出:
2017-04-30 15:33:31 [scrapy] DEBUG: Crawled (200) <GET file:///F:/algorithm%20st
udy/python/StackOverFlow/Booking.html> (referer: None)
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nStockholm\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nG\xf6teborg\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nVisby\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nFalkenberg\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nMalm\xf6\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nLysekil\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nSimrishamn\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nLund\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nK\xf6pingsvik\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nBorgholm\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nJ\xf6nk\xf6ping\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nUppsala\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nF\xe4rjestaden\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nHelsingborg\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nRonneby\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nYstad\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nHalmstad\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nKivik\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nBorrby\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nFj\xe4llbacka\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nKarlskrona\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nGr\xe4nna\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nL\xf6ttorp\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nNorrk\xf6ping\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\n\xd6rebro\n']}
2017-04-30 15:33:31 [scrapy] INFO: Closing spider (finished)
2017-04-30 15:33:31[scrapy]调试:爬网(200)(参考:无)
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nStockholm\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nG\xf6teborg\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nVisby\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nAlkenberg\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nMalm\xf6\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nLysekil\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nSimrishamn\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\n und\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nK\xf6pingsvik\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\n borgholm\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nJ\xf6nk\xf6ping\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nUppsala\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nF\xe4rjestaden\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\n赫尔辛堡\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nRonneby\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nYstad\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nHalmstad\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nKivik\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\n borrby\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nFj\xe4llbacka\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nKarlskrona\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nGr\xe4nna\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nL\xf6ttorp\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\nNorrk\xf6ping\n']}
2017-04-30 15:33:31[刮伤]调试:刮伤自
{'text':[u'\n\xd6rebro\n']}
2017-04-30 15:33:31[scrapy]信息:关闭卡盘(已完成)
`
你需要在结尾保留逗号
“'text':content.css('span.filter_label::text')。extract()”
现在检查一下
不幸的是,它仍然一无所获。不幸的是,它仍然一无所获。
2017-04-30 15:33:31 [scrapy] DEBUG: Crawled (200) <GET file:///F:/algorithm%20st
udy/python/StackOverFlow/Booking.html> (referer: None)
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nStockholm\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nG\xf6teborg\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nVisby\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nFalkenberg\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nMalm\xf6\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nLysekil\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nSimrishamn\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nLund\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nK\xf6pingsvik\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nBorgholm\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nJ\xf6nk\xf6ping\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nUppsala\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nF\xe4rjestaden\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nHelsingborg\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nRonneby\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nYstad\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nHalmstad\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nKivik\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nBorrby\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nFj\xe4llbacka\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nKarlskrona\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nGr\xe4nna\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nL\xf6ttorp\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\nNorrk\xf6ping\n']}
2017-04-30 15:33:31 [scrapy] DEBUG: Scraped from <200 file:///F:/algorithm%20stu
dy/python/StackOverFlow/Booking.html>
{'text': [u'\n\xd6rebro\n']}
2017-04-30 15:33:31 [scrapy] INFO: Closing spider (finished)
`
def parse(self, response):
for content in response.xpath('//a[contains(@class, "uf")]'):
yield {
'text':content.css('span.filter_label::text').extract(),
}
def parse(self, response):
for content in response.css('a[data-name=uf)]'):
yield {
'text': content.css('span.filter_label::text').extract(),
}