Python 试图运行一个蜘蛛，但它运行另一个_Python_Scrapy

Python 试图运行一个蜘蛛，但它运行另一个

python scrapy

Python 试图运行一个蜘蛛，但它运行另一个,python,scrapy,Python,Scrapy,我一直在研究两只蜘蛛。它们共享同一个文件系统，在我开始对第二个spider进行繁重的工作之前，第一个spider正在工作。现在我已经完成了第二个，我希望在我尝试将它们拼接在一起之前给每个spider一个测试运行。当我尝试运行第一个时，它尝试执行第二个，但失败了，因为它取决于第一个生成的文件。值得注意的是，我一直在通过谷歌硬盘传递这个项目，这样我就可以在多台机器上进行工作编辑：我让它工作了，但也许有人能帮我理解为什么。这是我的第一只蜘蛛： stockHighs.py： from scrapy.

我一直在研究两只蜘蛛。它们共享同一个文件系统，在我开始对第二个spider进行繁重的工作之前，第一个spider正在工作。现在我已经完成了第二个，我希望在我尝试将它们拼接在一起之前给每个spider一个测试运行。当我尝试运行第一个时，它尝试执行第二个，但失败了，因为它取决于第一个生成的文件。值得注意的是，我一直在通过谷歌硬盘传递这个项目，这样我就可以在多台机器上进行工作

编辑：

我让它工作了，但也许有人能帮我理解为什么。这是我的第一只蜘蛛：

stockHighs.py：

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import CsvItemExporter

from stockscrape.items import StockscrapeItem

class highScrape(BaseSpider):
        name = "stockhighs"
        allowed_domains = ["barchart.com"]
        start_urls = ["http://www.barchart.com/stocks/high.php?_dtp1=0"]

        def parse(self, response):
                f = open("test.txt","w")
                sel = HtmlXPathSelector(response)
                sites = sel.select("//tbody/tr")
                for site in sites:
                        item = StockscrapeItem()
                        item['symbol']  = site.select("td[contains(@class, 'ds_symbol')]/a/text()").extract()
                        strItem = str(item)
                        newItem = strItem.decode('string_escape').replace("{'symbol': [u'","").replace("']}","")
                        f.write("%s\n" % newItem)
                f.close()

这是我的第二只蜘蛛：

epsRating.py：

# coding: utf-8
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import CsvItemExporter
import re
import csv
import urlparse
from stockscrape.items import EPSItem
from itertools import izip

class epsScrape(BaseSpider):
        name = "eps"
        allowed_domains = ["investors.com"]
        ifile = open('test.txt', "r")
        reader = csv.reader(ifile)
        start_urls = []
        for row in ifile:
                url = row.replace("\n","")
                if url == "symbol":
                        continue
                else:
                        start_urls.append("http://research.investors.com/quotes/nyse-" + url + ".htm")
        ifile.close()

        def parse(self, response):
                tempSymbol = ""
                tempEps = 10
                f = open("eps.txt", "a+")
                sel = HtmlXPathSelector(response)
                sites = sel.select("//div")
                for site in sites:
                        item = EPSItem()
                        item['symbol'] = site.select("h2/span[contains(@id, 'qteSymb')]/text()").extract()
                        item['eps']  = site.select("table/tbody/tr/td[contains(@class, 'rating')]/span/text()").extract()
                        strSymb = str(item['symbol'])
                        newSymb = strSymb.replace("[]","").replace("[u'","").replace("']","")
                        strEps = str(item['eps'])
                        newEps = strEps.replace("[]","").replace(" ","").replace("[u'\\r\\n","").replace("']","")
                        if not newSymb == "":
                                tempSymbol = newSymb
                        if not newEps == "":
                                tempEps = int(newEps)
                if not  tempEps < 85:
                        f.write("%s\t%s\n" % (tempSymbol, str(tempEps)))
                f.close()

#编码：utf-8
从scrapy.spider导入BaseSpider
从scrapy.selector导入HtmlXPathSelector
从scrapy.contrib.exporter进口CsvItemExporter
进口稀土
导入csv
导入URL解析
从stockscrape.items导入EPSItem
从itertools导入izip
epsScrape类（BaseSpider）：
name=“eps”
允许的_域=[“investors.com”]
ifile=open（'test.txt'，“r”）
reader=csv.reader（ifile）
起始URL=[]
对于ifile中的行：
url=行。替换（“\n”，”）
如果url==“符号”：
持续
其他：
开始\u URL。追加（“http://research.investors.com/quotes/nyse-“+url+”.htm”）
ifile.close（）
def解析（自我，响应）：
tempSymbol=“”
节拍=10
f=打开（“eps.txt”，“a+”）
sel=HtmlXPathSelector（响应）
sites=sel.select（“//div”）
对于站点中的站点：
item=EPSItem（）
item['symbol']=site.select（“h2/span[contains（@id，'qteSymb'）]]/text（）”）.extract（）
item['eps']=site.select（“table/tbody/tr/td[contains（@class，'rating'））]/span/text（）”。extract（）
strSymb=str（项目['symbol']）
newSymb=strSymb.replace（“[]”，“”）。replace（“[u]”，“”）。replace（“]”，“”）
strEps=str（项目['eps']）
newEps=strEps.replace（“[]”，“”）。replace（“”，“”）。replace（“[u'\\r\\n”，“”）。replace（“]”，“”）
如果不是newSymb==“”：
tempSymbol=newSymb
如果不是newEps==“”：
tempEps=int（newEps）
如果不是tempEps<85：
f、 写入（“%s\t%s\n”%（tempSymbol，str（tempEps）））
f、 关闭（）

以下是我得到的错误：

$ scrapy crawl stockhighs
Traceback (most recent call last):
  File "/usr/bin/scrapy", line 4, in <module>
    execute()
  File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/usr/lib/pymodules/python2.7/scrapy/commands/crawl.py", line 47, in run
    crawler = self.crawler_process.create_crawler()
  File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 88, in create_crawler
    self.crawlers[name] = Crawler(self.settings)
  File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 26, in __init__
    self.spiders = spman_cls.from_crawler(self)
  File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 35, in from_crawler
    sm = cls.from_settings(crawler.settings)
  File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 31, in from_settings
    return cls(settings.getlist('SPIDER_MODULES'))
  File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 22, in __init__
    for module in walk_modules(name):
  File "/usr/lib/pymodules/python2.7/scrapy/utils/misc.py", line 66, in walk_modules
    submod = __import__(fullpath, {}, {}, [''])
  File "/home/bwisdom/scrapy/stockscrape/spiders/epsRating.py", line 11, in <module>
    class epsScrape(BaseSpider):
  File "/home/bwisdom/scrapy/stockscrape/spiders/epsRating.py", line 14, in epsScrape
    ifile = open('test.txt', "r")
IOError: [Errno 2] No such file or directory: 'test.txt'

$scrapy crawl stockhighs
回溯（最近一次呼叫最后一次）：
文件“/usr/bin/scrapy”，第4行，在
执行（）
文件“/usr/lib/pymodules/python2.7/scrapy/cmdline.py”，执行中的第142行
_运行\u打印\u帮助（解析器、\u运行\u命令、cmd、args、opts）
文件“/usr/lib/pymodules/python2.7/scrapy/cmdline.py”，第88行，在“运行”和“打印”帮助中
func（*a，**千瓦）
文件“/usr/lib/pymodules/python2.7/scrapy/cmdline.py”，第149行，在_run_命令中
cmd.run（参数、选项）
文件“/usr/lib/pymodules/python2.7/scrapy/commands/crawl.py”，第47行，运行中
crawler=self.crawler\u进程.create\u crawler（）
文件“/usr/lib/pymodules/python2.7/scrapy/crawler.py”，第88行，在create\u crawler中
self.Crawler[name]=爬虫程序（self.settings）
文件“/usr/lib/pymodules/python2.7/scrapy/crawler.py”，第26行，在__
self.spider=spman\u cls.来自爬虫（self）
文件“/usr/lib/pymodules/python2.7/scrapy/spidermanager.py”，第35行，来自爬虫程序
sm=cls.from_设置（爬虫程序设置）
文件“/usr/lib/pymodules/python2.7/scrapy/spidermanager.py”，第31行，在from_设置中
返回cls（settings.getlist（'SPIDER_MODULES'））
文件“/usr/lib/pymodules/python2.7/scrapy/spidermanager.py”，第22行，在__
对于walk_模块中的模块（名称）：
文件“/usr/lib/pymodules/python2.7/scrapy/utils/misc.py”，第66行，在walk_模块中
submod=uuu导入（完整路径，{}，{}，[''']）
文件“/home/bwisdom/scrapy/stockscrape/spiders/epsRating.py”，第11行，在
epsScrape类（BaseSpider）：
文件“/home/bwisdom/scrapy/stockscrape/spider/epsRating.py”，第14行，在epssrave中
ifile=open（'test.txt'，“r”）
IOError:[Errno 2]没有这样的文件或目录：“test.txt”

我修复它的方法是创建一个空白的test.txt文件。现在我知道我在第一个spider中设置了一个“w”，所以它不会打开一个新文件，但是即使我设置了“w+”或“a+”，它也不会工作，直到我创建了空的test.txt。一旦我创建了第一个，第二个也会像它应该的那样运行

我想我感到困惑的是，为什么它在尝试运行第一个爬行器时会调用第二个爬行器。

您需要告诉我们爬行器在做什么，以及存在什么依赖关系-给我们一个简单的代码工作示例。运行第一个刮片器并不是调用第二个刮片器。在尝试运行第一个刮板时，scrapy正在加载所有模块，然后选择要运行的模块。但由于第二个有错误，整个过程都出错了。一旦你修复了错误，一切看起来都很好。好的，那么模块是如何工作的呢？如果我删除第二个spider，它会不会说没有epsrating.py而不是上面的？