使用scrapy（Python）获取特定内容_Python_Csv_Scrapy

使用scrapy（Python）获取特定内容

python csv scrapy

使用scrapy（Python）获取特定内容,python,csv,scrapy,Python,Csv,Scrapy,我想了解一下werbsit：所有产品的名称、说明、图像、价格、库存和参考资料这是我的代码： class AmericanSpider(scrapy.Spider): name = 'American' start_urls = ['http://www.americandent.es/inicio/log-0'] def parse(self, response): return [FormRequest.from_response(response,

我想了解一下werbsit：所有产品的名称、说明、图像、价格、库存和参考资料

这是我的代码：

class AmericanSpider(scrapy.Spider):
name = 'American'
start_urls = ['http://www.americandent.es/inicio/log-0']
def parse(self, response):
    return [FormRequest.from_response(response,
                formdata={'username': 'name', 'password': 'private'},
                callback=self.after_login)]


# continue scraping with authenticated session...
def after_login(self, response):
# check login succeed before going on
    if "authentication failed" in response.body:
        self.logger.error("Login failed")
        return
# We've successfully authenticated, let's have some fun!
    else:
        print('LOGG!')
        return Request(url='http://www.americandent.es/productos/#!producto',
           callback=self.parse_tastypage)


def parse_tastypage(self, response):
    hxs = HtmlXPathSelector(response)
    urls = hxs.select('//a/@href').extract()
    for u in urls:
        if urlsplit(u).netloc == urlsplit(response.url).netloc:
            yield Request(url=u, callback=self.parse_tastypage2)
def parse_tastypage2(self, response):
    hxs = scrapy.Selector(response)
    titles = hxs.xpath('//*[@id="list-prd"]/div[3]')
    items = []
    for titles in titles:
        item = StackItem()
        #codigo producto = reference
        reference = titles.select('//*[@id="codigo_producto"]').extract()
        name= titles.select('//*[@id="list-prd"]/div[2]').extract()
        url = titles.select('//*[@id="contProductos"]').extract()
        #tarifa = price
        price = titles.select('//*[@id="lb-tarifas"]/div/div[2]/p/strong').extract()
        stock = titles.select('//*[@id="lb-descripcion"]/p[3]/strong').extract()
        descripton = titles.select('//*[@id="lb-descripcion"]/p[5]').extract()
        imagen=titles.select('//*[@id="stock"]/img').extract()
        item['name']= name
        item['url'] = url
        item['stock']= stock
        item['price']= price
        item['reference']= reference
        item['description']= description
        item['imagen'] = imagen
        items.append(item)
    return items

但结果并不像预期的那样：

我需要使用csv格式化列（|是分隔列的一个示例）：

参考|价格|名称|股票|图片

000cab | 100€|名称1 | 2u | img1.png

2323ac | 200€| name2 | 3u | img2.png

您不是在提取文本，而是在提取HTML

e、 g:

标题。选择（'/*[@id=“codigo_producto”]”）。提取（）
此xpath应返回具有id=“codigo\u producto”

如果要查找具有id=“codigo\u producto”

您应该使用xpath:/*[@id=“codigo\u producto”]/text（）

或者如果您正在查找/*[@id=“codigo\u producto”]

你应该使用
//*[@id="codigo_producto"]//text()

尝试阅读一些好的xpath教程
它看起来不正确：对于标题中的标题
已解决，但仍然发生同样的事情