Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy spider未保存到csv_Python_Csv_Text_Scrapy_Pipeline - Fatal编程技术网

Python Scrapy spider未保存到csv

Python Scrapy spider未保存到csv,python,csv,text,scrapy,pipeline,Python,Csv,Text,Scrapy,Pipeline,我有一个蜘蛛从文本文件中读取URL列表,并保存每个文件的标题和正文。爬网可以工作,但数据不会保存到csv。我设置了一个保存到csv的管道,因为normal-o选项对我不起作用。我确实为piepline更改了settings.py。在此方面的任何帮助都将不胜感激。 代码如下: Items.py from scrapy.item import Item, Field class PrivacyItem(Item): # define the fields for your item her

我有一个蜘蛛从文本文件中读取URL列表,并保存每个文件的标题和正文。爬网可以工作,但数据不会保存到csv。我设置了一个保存到csv的管道,因为normal-o选项对我不起作用。我确实为piepline更改了settings.py。在此方面的任何帮助都将不胜感激。 代码如下:

Items.py

from scrapy.item import Item, Field

class PrivacyItem(Item):
    # define the fields for your item here like:
    # name = Field()
    title = Field()
    desc = Field()
PrivacySpider.py

    from scrapy.contrib.spiders import CrawlSpider, Rule
    from scrapy.selector import HtmlXPathSelector
    from privacy.items import PrivacyItem

class PrivacySpider(CrawlSpider):
    name = "privacy"
    f = open("urls.txt")
    start_urls = [url.strip() for url in f.readlines()]
    f.close()

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    items =[]
    for url in start_urls:
        item = PrivacyItem()
        item['desc'] = hxs.select('//body//p/text()').extract()
        item['title'] = hxs.select('//title/text()').extract()      
        items.append(item)

    return items
管道.py

import csv

class CSVWriterPipeline(object):

    def __init__(self):
        self.csvwriter = csv.writer(open('CONTENT.csv', 'wb'))

    def process_item(self, item, spider):
        self.csvwriter.writerow([item['title'][0], item['desc'][0]])
        return item

您不必在
start\u URL上循环,scrapy正在执行以下操作:

for url in spider.start_urls:
    request url and call spider.parse() with its response
因此,您的解析函数应该如下所示:

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    item = PrivacyItem()
    item['desc'] = hxs.select('//body//p/text()').extract()
    item['title'] = hxs.select('//title/text()').extract()      
    return item

另外,为了避免将列表作为项目字段返回,请执行以下操作:
hxs.select(“..”).extract()[0]

您好,谢谢您的回复。这很有帮助。它仍然没有保存在csv或json中,文件仍然为空。不过,这个建议很有帮助。我很感激!我误解了start_url的循环。这起作用了!我犯了一个愚蠢的缩进错误。我真的很感谢你的回答。