Python 2.7 Scrapy：使用Item Loader返回新CSV行中的每个项目_Python 2.7_Csv_Scrapy

Python 2.7 Scrapy：使用Item Loader返回新CSV行中的每个项目

python-2.7 csv scrapy

Python 2.7 Scrapy：使用Item Loader返回新CSV行中的每个项目,python-2.7,csv,scrapy,Python 2.7,Csv,Scrapy,我正在尝试生成包含在特定类title、link、price中的选定项的csv输出，该类使用itemloader和items模块解析出自己列中的每个项，以及自己行中的每个实例我可以使用一个自包含的spider生成输出，而不需要使用items模块，但是，我正在尝试学习在items模块中详细描述项目的正确方法，以便最终可以使用正确的结构扩展项目。我将在下面的“工作行输出Spider代码”中详细介绍此代码我还尝试将在相关帖子中确定或讨论的解决方案纳入其中；特别是：邮寄人邮寄人通过使用for循环

我正在尝试生成包含在特定类title、link、price中的选定项的csv输出，该类使用itemloader和items模块解析出自己列中的每个项，以及自己行中的每个实例

我可以使用一个自包含的spider生成输出，而不需要使用items模块，但是，我正在尝试学习在items模块中详细描述项目的正确方法，以便最终可以使用正确的结构扩展项目。我将在下面的“工作行输出Spider代码”中详细介绍此代码

我还尝试将在相关帖子中确定或讨论的解决方案纳入其中；特别是：

邮寄人

通过使用for循环，正如他在comments部分底部所指出的那样。但是，我可以让scrapy接受for循环，它只是不会导致任何更改，即项目仍然分组在单个字段中，而不是输出到独立的行中

下面是两个项目尝试中包含的代码的详细信息—“工作行输出蜘蛛代码”（不包含items模块和items loader）和“非工作行输出蜘蛛代码”——以及每个项目的相应输出

工作行输出Spider代码：btobasics.py

运行命令生成CSV:$scrapy crawl basic-o output.CSV

非工作行输出Spider代码：btobasictwo.py

非工作行输出项代码：btobasictwo.Items.py

运行命令生成CSV:$scrapy crawl basic-o output.CSV

如您所见，当尝试合并items模块、itemloaders和for循环来构造数据时，它不会按行分隔实例，而是将特定项目标题、链接和价格的所有实例放在3个字段中

我将非常感谢在这方面的任何帮助，并为冗长的帖子道歉。我只是想尽可能多地编写文档，以便任何想要帮助的人都可以自己运行代码，和/或从我的文档中完全理解问题所在。如果你觉得这样长的长度不合适，请留下一条评论，说明文章的长度

非常感谢

您需要告诉您的ItemLoader使用另一个选择器：

我试着按照你的建议修改第5行。但是，它没有改变输出。我还尝试了一些变体，例如：“l=ItemLoaderitem=BtobasictwoItem，selector=links”和“l=ItemLoaderitem=BtobasictwoItem，selector=link，response=response”，我是否遗漏了什么。我是否需要用其他方式修改代码？非常感谢。我能够按照指示将代码替换为最后一行中“return”的amendment，并使用“yield”来生成预期的输出。我试图对您的贡献进行编辑，但通过引用“响应”而不是“返回”创建了我自己的错误。再次感谢……非常感谢@对不起，我错过了这部分。别忘了接受我的回答：-

import scrapy
import urlparse

class BasicSpider(scrapy.Spider):
    name = 'basic'
    allowed_domains = ['http://http://books.toscrape.com/']
    start_urls = ['http://books.toscrape.com//']

    def parse(self, response):
        titles = response.xpath('//*[@class="product_pod"]/h3//text()').extract()
        links = response.xpath('//*[@class="product_pod"]/h3/a/@href').extract()
        prices = response.xpath('//*[@class="product_pod"]/div[2]/p[1]/text()').extract()

        for item in zip(titles, links, prices):
        # create a dictionary to store the scraped info
            scraped_info = {
                'title': item[0],
                'link': item[1],
                'price': item[2],
            }

            # yield or give the scraped info to scrapy
            yield scraped_info

import datetime
import urlparse
import scrapy

from btobasictwo.items import BtobasictwoItem

from scrapy.loader.processors import MapCompose
from scrapy.loader import ItemLoader


class BasicSpider(scrapy.Spider):
    name = 'basic'
    allowed_domains = ['http://http://books.toscrape.com/']
    start_urls = ['http://books.toscrape.com//']

    def parse(self, response):
        # Create the loader using the response
        links = response.xpath('//*[@class="product_pod"]')
        for link in links:
            l = ItemLoader(item=BtobasictwoItem(), response=response)

            # Load fields using XPath expressions
            l.add_xpath('title', '//*[@class="product_pod"]/h3//text()',
                        MapCompose(unicode.strip))
            l.add_xpath('link', '//*[@class="product_pod"]/h3/a/@href',
                        MapCompose(lambda i: urlparse.urljoin(response.url, i)))
            l.add_xpath('price', '//*[@class="product_pod"]/div[2]/p[1]/text()',
                        MapCompose(unicode.strip))
            # Log fields
            l.add_value('url', response.url)
            l.add_value('date', datetime.datetime.now())

            return l.load_item()

from scrapy.item import Item, Field


class BtobasictwoItem(Item):
    # Primary fields
    title = Field()
    link = Field()
    price = Field()
    # Log fields
    url = Field()
    date = Field()

def parse(self, response):
    # Create the loader using the response
    links = response.xpath('//*[@class="product_pod"]')
    for link in links:
        l = ItemLoader(item=BtobasictwoItem(), selector=link)

        # Load fields using XPath expressions
        l.add_xpath('title', './/h3//text()',
                    MapCompose(unicode.strip))
        l.add_xpath('link', './/h3/a/@href',
                    MapCompose(lambda i: urlparse.urljoin(response.url, i)))
        l.add_xpath('price', './/div[2]/p[1]/text()',
                    MapCompose(unicode.strip))
        # Log fields
        l.add_value('url', response.url)
        l.add_value('date', datetime.datetime.now())

        yield l.load_item()