Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python raise NotSupported(响应内容为';t text";)-scrapy.exceptions.NotSupported:响应内容为';t文本_Python_Xpath_Scrapy - Fatal编程技术网

Python raise NotSupported(响应内容为';t text";)-scrapy.exceptions.NotSupported:响应内容为';t文本

Python raise NotSupported(响应内容为';t text";)-scrapy.exceptions.NotSupported:响应内容为';t文本,python,xpath,scrapy,Python,Xpath,Scrapy,几天后我就犯了同样的错误。我解决不了!!我真的不明白我的代码哪里不正确。我以前已经通过更改“链接”部分解决了类似的错误消息,但现在,它不再工作了。有人能帮我吗 # -*- coding: utf-8 -*- import scrapy import re import numbers from amazon_test.items import AmazonTestItem from urllib.parse import urlparse from scrapy.spiders import C

几天后我就犯了同样的错误。我解决不了!!我真的不明白我的代码哪里不正确。我以前已经通过更改“链接”部分解决了类似的错误消息,但现在,它不再工作了。有人能帮我吗

# -*- coding: utf-8 -*-
import scrapy
import re
import numbers
from amazon_test.items import AmazonTestItem
from urllib.parse import urlparse
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class AmazonSellersSpider(CrawlSpider): #scrapy.Spider
    name = 'AmazonFR'
    allowed_domains = ['amazon.fr']
    start_urls = ['https://www.amazon.fr']

    rules = (
        Rule(LinkExtractor(allow=()), callback='parse'),
    )

    def parse(self, response):
        item = AmazonTestItem()
        link = (response.xpath('//div[@class="a-column a-span6"]/h3[@id="-component-heading"]/text()'))
        if link:
            wait = response.xpath('//div[@class="a-column a-span6"]/h3[@id="-component-heading"]/text()').extract()
            if (len(wait) != 0):
                name = response.xpath('//div[@class="a-row a-spacing-medium"]/div[@class="a-column a-span6"]/ul[@class="a-unordered-list a-nostyle a-vertical"]/li//span[@class="a-list-item"]/span[contains(.,"Nom")]/following-sibling::text()').extract()
                phone = response.xpath('//div[@class="a-column a-span6"]/ul[@class="a-unordered-list a-nostyle a-vertical"]/li//span[@class="a-list-item"]/span[contains(.,"Téléphone")]/following-sibling::text()').extract()
                registre = response.xpath('//div[@class="a-column a-span6"]/ul[@class="a-unordered-list a-nostyle a-vertical"]/li//span[@class="a-list-item"]/span[contains(.,"registre de commerce")]/following-sibling::text()').extract()
                TVA = response.xpath('//div[@class="a-column a-span6"]/ul[@class="a-unordered-list a-nostyle a-vertical"]/li//span[@class="a-list-item"]/span[contains(.,"TVA")]/following-sibling::text()').extract()
                address = response.xpath('//div[@class="a-column a-span6"]/ul[@class="a-unordered-list a-nostyle a-vertical"]/li//span[span[contains(.,"Adresse")]]/ul//li//text()').extract()
                item['Business_name'] = ''.join(name).strip()
                item['Phone_number'] = ''.join(phone).strip()
                item['VAT_number'] = ''.join(TVA).strip()
                item['Address'] = '\n'.join(address).strip()
                item['Registre_commerce'] = ''.join(registre).strip()
                yield item
        else:
            for sel in response.xpath('//html/body'):
                item = AmazonTestItem()
                list_urls = sel.xpath('//a/@href').extract()
                for url in list_urls:
                    yield scrapy.Request(response.urljoin(url), callback=self.parse, meta={'item': item})
错误消息是:

Traceback (most recent call last):
  File "C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\Users\paulpo\Documents\amazon_test\amazon_test\spiders\AmazonFR.py", line 21, in parse
    link = (response.xpath('//div[@class="a-column a-span6"]/h3[@id="-component-heading"]/text()')).extract
  File "C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site-packages\scrapy\http\response\__init__.py", line 105, in xpath
    raise NotSupported("Response content isn't text")
scrapy.exceptions.NotSupported: Response content isn't text
回溯(最近一次呼叫最后一次):
文件“C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site packages\scrapy\utils\defer.py”,第102行,在iter\u errback中
下一个(it)
文件“C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site packages\scrapy\spidermiddleware\offsite.py”,第29行,进程中输出
对于结果中的x:
文件“C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site packages\scrapy\spidermiddleware\referer.py”,第339行,在
返回(_set_referer(r)表示结果中的r或())
文件“C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site packages\scrapy\spidermiddleware\urlength.py”,第37行,在
返回(结果中的r表示r或()如果_过滤器(r))
文件“C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site packages\scrapy\spidermiddleware\depth.py”,第58行,在
返回(结果中的r表示r或()如果_过滤器(r))
文件“C:\Users\paulpo\Documents\amazon\u test\amazon\u test\spider\AmazonFR.py”,第21行,在parse中
link=(response.xpath('//div[@class=“a-column a-span6”]/h3[@id=“-component heading”]]/text()).extract
xpath中的文件“C:\Users\paulpo\AppData\Local\Continuum\Anaconda3\lib\site packages\scrapy\http\response\\ uuuu init\uuuuu.py”,第105行
不支持raise(“响应内容不是文本”)
scrapy.exceptions.NotSupported:响应内容不是文本

您是否遗漏了链接的.extract()?我只是想知道是否存在具有此结构的文本。您认为我需要添加“extract()”吗?因为如果没有文本,他将无法提取它。这个错误意味着HTTP响应不能被解码为HTML或XML,因此您不能对其使用
.xpath()
。打印原始正文中的几个字节,可能还有一些标题会很有趣,比如
self.logger((response.headers,response.body[:256])
Mmh,现在我有一条消息:
INFO:忽略响应:HTTP状态代码未处理或不允许
。我把你的
self.logger(某物)
放在我的解析函数中。。。有什么想法吗?我只能建议你阅读亚马逊的条款和条件。如果他们在某种程度上允许刮擦(我不知道),请确保尊重robots.txt和。