Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Scrapy创建网站和.csv文件_Python_Python 2.7_Scrapy - Fatal编程技术网

Python 使用Scrapy创建网站和.csv文件

Python 使用Scrapy创建网站和.csv文件,python,python-2.7,scrapy,Python,Python 2.7,Scrapy,我正在尝试运行一个spider,它从中获取某些信息,并用这些数据创建一个.csv文件。我希望蜘蛛浏览每个日期,并为每个列出的日期抓取关键信息: 这就是我到目前为止所做的: import scrapy class hltvspider(scrapy.Spider): name = "hltvspider" allowed_domains = ["hltv.org"] start_urls = ["http://www.hltv.org/?pageid=188&ev

我正在尝试运行一个spider,它从中获取某些信息,并用这些数据创建一个.csv文件。我希望蜘蛛浏览每个日期,并为每个列出的日期抓取关键信息:

这就是我到目前为止所做的:

import scrapy

class hltvspider(scrapy.Spider):
    name = "hltvspider"
    allowed_domains = ["hltv.org"]
    start_urls = ["http://www.hltv.org/?pageid=188&eventid=0&gameid=2"]

    def parse(self, response):
        for sel in response.xpath('//ul/li'):
            title = sel.xpath('a/text()').extract()
            link = sel.xpath('a/@href').extract()
            desc = sel.xpath('text()').extract()
            print title, link, desc
以下是我得到的输出:

C:\Users\Michael\PycharmProjects\HLTV\HLTV\spider\HLTV.py:5:scrapydeproduction警告:HLTV.spider.HLTV.MySpider继承自不推荐的类scrapy.spider.BaseSpider,请输入
瑞特从斯拉皮·蜘蛛·蜘蛛来。(警告仅限于第一个子类,可能还有其他子类)
2015-01-21 16:20:22-0600[scrapy]信息:scrapy 0.24.4已启动(bot:HLTV)
2015-01-21 16:20:22-0600[scrapy]信息:可选功能:ssl、http11
2015-01-21 16:20:22-0600[scrapy]信息:覆盖的设置:{'NEWSPIDER_模块':'HLTV.SPIDER','SPIDER_模块':['HLTV.SPIDER'],'BOT_NAME':'HLTV'}
2015-01-21 16:20:22-0600[scrapy]信息:启用的扩展:LogStats、TelnetConsole、CloseSpider、WebService、CoreStats、SpiderState
2015-01-21 16:20:22-0600[scrapy]信息:启用的下载程序中间件:HttpAuthMiddleware、DownloadTimeoutMiddleware、UserAgentMiddleware、RetryMiddleware、DefaultHeadersMiddleware、MetaR
efreshMiddleware、HttpCompressionMiddleware、重定向中间件、Cookies中间件、ChunkedTransferMiddleware、DownloaderStats
2015-01-21 16:20:22-0600[scrapy]信息:启用的spider中间件:HttpErrorMiddleware、OffsiteMiddleware、RefererMiddleware、UrlLengthMiddleware、DepthMiddleware
2015-01-21 16:20:22-0600[scrapy]信息:启用的项目管道:
2015-01-21 16:20:22-0600[HLTVSider]信息:蜘蛛已打开
2015-01-21 16:20:22-0600[HLTVSider]信息:爬网0页(0页/分钟),刮取0项(0项/分钟)
2015-01-21 16:20:22-0600[scrapy]调试:Telnet控制台监听127.0.0.1:6023
2015-01-21 16:20:22-0600[scrapy]调试:在127.0.0.1:6080上侦听Web服务
2015-01-21 16:20:23-0600[HLTVSider]调试:爬网(200)(参考:无)
[][[u'\n\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n',u'\n',u'\n']
[][[u'\n\t\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
[][[u'\n\t\t\t\t\t',u'\n\t\t\t\t',u'\n\t\t\t']
2015-01-21 16:20:23-0600[hltvspider]信息:关闭卡盘(已完成)
2015-01-21 16:20:23-0600[hltvspider]信息:倾倒碎屑统计数据:
{'downloader/request_bytes':241,
“下载程序/请求计数”:1,
“downloader/request\u method\u count/GET”:1,
“downloader/response_字节”:13544,
“下载程序/响应计数”:1,
“下载程序/响应状态\计数/200”:1,
“完成原因”:“完成”,
“完成时间”:datetime.datetime(2015,1,21,22,20,23432000),
“日志计数/调试”:3,
“日志计数/信息”:7,
“响应\u已接收\u计数”:1,
“调度程序/出列”:1,
“调度程序/出列/内存”:1,
“调度程序/排队”:1,
“调度程序/排队/内存”:1,
“开始时间”:datetime.datetime(2015,1,21,22,20,22,775000)}
2015-01-21 16:20:23-0600[hltvspider]信息:卡盘关闭(完成)

检查这是否适合您

import scrapy
from scrapy.selector import Selector

from megacritics.items import MegacriticsItem

class testspider(scrapy.Spider):
    name = "pupu"
    allowed_domains = ["hltv.org"]
    start_urls = ["http://www.hltv.org/?pageid=188&eventid=0&gameid=2"]

    def parse(self,response):
        hxs = Selector(response)
        sites = hxs.select('//div[@style="width:606px;height:22px;background-color:white"]')
        items = []
        for site in sites:
            item = MegacriticsItem()
            item['date'] = site.select('.//div[@style="padding-left:5px;padding-top:5px;"]/a/div/text()').extract()
            # item['team1'] = site.select('.//div[@class="covSmallHeadline"]/text()').extract()
            # item['team2'] = site.select('.//div[@class="covSmallHeadline"]/text()').extract()
            # item['map'] = site.select('.//div[@class="covSmallHeadline"]/text()').extract()
            # item['event'] = site.select('.//div[@class="covSmallHeadline"]/text()').extract()
            items.append(item)
        return items
可能重复的