Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/blackberry/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用scrapy(Python)获取特定内容_Python_Csv_Scrapy - Fatal编程技术网

使用scrapy(Python)获取特定内容

使用scrapy(Python)获取特定内容,python,csv,scrapy,Python,Csv,Scrapy,我想了解一下werbsit: 所有产品的名称、说明、图像、价格、库存和参考资料 这是我的代码: class AmericanSpider(scrapy.Spider): name = 'American' start_urls = ['http://www.americandent.es/inicio/log-0'] def parse(self, response): return [FormRequest.from_response(response,

我想了解一下werbsit: 所有产品的名称、说明、图像、价格、库存和参考资料

这是我的代码:

class AmericanSpider(scrapy.Spider):
name = 'American'
start_urls = ['http://www.americandent.es/inicio/log-0']
def parse(self, response):
    return [FormRequest.from_response(response,
                formdata={'username': 'name', 'password': 'private'},
                callback=self.after_login)]


# continue scraping with authenticated session...
def after_login(self, response):
# check login succeed before going on
    if "authentication failed" in response.body:
        self.logger.error("Login failed")
        return
# We've successfully authenticated, let's have some fun!
    else:
        print('LOGG!')
        return Request(url='http://www.americandent.es/productos/#!producto',
           callback=self.parse_tastypage)


def parse_tastypage(self, response):
    hxs = HtmlXPathSelector(response)
    urls = hxs.select('//a/@href').extract()
    for u in urls:
        if urlsplit(u).netloc == urlsplit(response.url).netloc:
            yield Request(url=u, callback=self.parse_tastypage2)
def parse_tastypage2(self, response):
    hxs = scrapy.Selector(response)
    titles = hxs.xpath('//*[@id="list-prd"]/div[3]')
    items = []
    for titles in titles:
        item = StackItem()
        #codigo producto = reference
        reference = titles.select('//*[@id="codigo_producto"]').extract()
        name= titles.select('//*[@id="list-prd"]/div[2]').extract()
        url = titles.select('//*[@id="contProductos"]').extract()
        #tarifa = price
        price = titles.select('//*[@id="lb-tarifas"]/div/div[2]/p/strong').extract()
        stock = titles.select('//*[@id="lb-descripcion"]/p[3]/strong').extract()
        descripton = titles.select('//*[@id="lb-descripcion"]/p[5]').extract()
        imagen=titles.select('//*[@id="stock"]/img').extract()
        item['name']= name
        item['url'] = url
        item['stock']= stock
        item['price']= price
        item['reference']= reference
        item['description']= description
        item['imagen'] = imagen
        items.append(item)
    return items
但结果并不像预期的那样:

我需要使用csv格式化列(|是分隔列的一个示例):

参考|价格|名称|股票|图片

000cab | 100€|名称1 | 2u | img1.png


2323ac | 200€| name2 | 3u | img2.png

您不是在提取文本,而是在提取HTML

e、 g:
标题。选择('/*[@id=“codigo_producto”]”)。提取()

此xpath应返回具有
id=“codigo\u producto”

如果要查找具有
id=“codigo\u producto”

您应该使用xpath:
/*[@id=“codigo\u producto”]/text()

或者如果您正在查找
/*[@id=“codigo\u producto”]

你应该使用

//*[@id="codigo_producto"]//text()

尝试阅读一些好的xpath教程

它看起来不正确:
对于标题中的标题
已解决,但仍然发生同样的事情