Python 从脚本运行scrapy时出错_Python_Scrapy

Python 从脚本运行scrapy时出错

python scrapy

Python 从脚本运行scrapy时出错,python,scrapy,Python,Scrapy,我尝试从脚本运行scrapy spider，而不是像这样从命令终端运行它： scrapy crawl spidername import scrapy from scrapy.crawler import CrawlerProcess from scrapy.loader import ItemLoader from properties.items import PropertiesItem class MySpider(scrapy.Spider): name = "basic

我尝试从脚本运行scrapy spider，而不是像这样从命令终端运行它：

scrapy crawl spidername

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.loader import ItemLoader
from properties.items import PropertiesItem


class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

在scrapy文档中，我找到了以下示例：

现在，我的代码如下所示：

scrapy crawl spidername

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.loader import ItemLoader
from properties.items import PropertiesItem


class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

运行此脚本时，出现以下错误：

文件 “/Library/Python/2.7/site packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-intel.egg/Twisted/internet/_sslverify.py”，第38行，在TLSVersion.TLSv1\u 1:SSL.OP\u NO\u TLSv1\u 1中， AttributeError:“模块”对象没有属性“OP\u no\u TLSv1\u 1”

因此，我的问题是：

1）这是什么样的错误？我在网上找不到任何例子

2）我可以更改什么使scrapy从此脚本运行

更新：

添加了为项目安装的软件包

attrs==16.3.0 
Automat==0.3.0 
cffi==1.9.1 
characteristic==14.3.0 
constantly==15.1.0 
cryptography==1.7.1 
cssselect==1.0.0 
enum34==1.1.6 
idna==2.2 
incremental==16.10.1 
ipaddress==1.0.17 
lxml==3.7.1 
parsel==1.1.0 
pyasn1==0.1.9 pyasn1-
modules==0.0.8 
pycparser==2.17 
PyDispatcher==2.0.5 
pyOpenSSL==0.15.1 
queuelib==1.4.2 
Scrapy==1.3.0 service-
identity==16.0.0 
six==1.10.0 
tree==0.1.0 
Twisted==16.6.0 
virtualenv==15.1.0 
w3lib==1.16.0 zope.
interface==4.3.3

1）我不确定

2）但您的缩进需要检查：

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.loader import ItemLoader
from properties.items import PropertiesItem


class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()

    process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

    process.crawl(MySpider)
    process.start() # the script will block here until the crawling is finished

我假设代码中还有其他一些例子。i、 e.要运行下面的爬行器，您需要输入

scrapy crawl basic

您有一个名为“properties”的文件夹，其中包含文件“items”，等等，我找到了一个解决方案：

创建了一个基于Python3.6而不是Python2.7的新虚拟环境。我运行了完全相同的代码（但必须用urllib.parse替换urlparse），它成功了

你看过我提供的链接了吗？进程不是spider的一部分。它是用来启动它的。看起来像是Twisted的问题。您可以粘贴项目中所有软件包的版本号吗。最好粘贴

pip freeze

的输出。这也可能是您使用的OpenSSL版本的问题。您可以粘贴您正在使用的OpenSSL版本吗？如果可以，您可以按照更新。实际上，我认为您应该使用

pip安装更新pyOpenSSL
的版本——升级pyOpenSSL

。它似乎不起作用。。我得到了和以前一样的错误。