Python 试图运行一个蜘蛛,但它运行另一个

Python 试图运行一个蜘蛛,但它运行另一个,python,scrapy,Python,Scrapy,我一直在研究两只蜘蛛。它们共享同一个文件系统,在我开始对第二个spider进行繁重的工作之前,第一个spider正在工作。现在我已经完成了第二个,我希望在我尝试将它们拼接在一起之前给每个spider一个测试运行。当我尝试运行第一个时,它尝试执行第二个,但失败了,因为它取决于第一个生成的文件。值得注意的是,我一直在通过谷歌硬盘传递这个项目,这样我就可以在多台机器上进行工作 编辑: 我让它工作了,但也许有人能帮我理解为什么。这是我的第一只蜘蛛: stockHighs.py: from scrapy.

我一直在研究两只蜘蛛。它们共享同一个文件系统,在我开始对第二个spider进行繁重的工作之前,第一个spider正在工作。现在我已经完成了第二个,我希望在我尝试将它们拼接在一起之前给每个spider一个测试运行。当我尝试运行第一个时,它尝试执行第二个,但失败了,因为它取决于第一个生成的文件。值得注意的是,我一直在通过谷歌硬盘传递这个项目,这样我就可以在多台机器上进行工作

编辑:

我让它工作了,但也许有人能帮我理解为什么。这是我的第一只蜘蛛:

stockHighs.py:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import CsvItemExporter

from stockscrape.items import StockscrapeItem

class highScrape(BaseSpider):
        name = "stockhighs"
        allowed_domains = ["barchart.com"]
        start_urls = ["http://www.barchart.com/stocks/high.php?_dtp1=0"]

        def parse(self, response):
                f = open("test.txt","w")
                sel = HtmlXPathSelector(response)
                sites = sel.select("//tbody/tr")
                for site in sites:
                        item = StockscrapeItem()
                        item['symbol']  = site.select("td[contains(@class, 'ds_symbol')]/a/text()").extract()
                        strItem = str(item)
                        newItem = strItem.decode('string_escape').replace("{'symbol': [u'","").replace("']}","")
                        f.write("%s\n" % newItem)
                f.close()
这是我的第二只蜘蛛:

epsRating.py:

# coding: utf-8
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.exporter import CsvItemExporter
import re
import csv
import urlparse
from stockscrape.items import EPSItem
from itertools import izip

class epsScrape(BaseSpider):
        name = "eps"
        allowed_domains = ["investors.com"]
        ifile = open('test.txt', "r")
        reader = csv.reader(ifile)
        start_urls = []
        for row in ifile:
                url = row.replace("\n","")
                if url == "symbol":
                        continue
                else:
                        start_urls.append("http://research.investors.com/quotes/nyse-" + url + ".htm")
        ifile.close()

        def parse(self, response):
                tempSymbol = ""
                tempEps = 10
                f = open("eps.txt", "a+")
                sel = HtmlXPathSelector(response)
                sites = sel.select("//div")
                for site in sites:
                        item = EPSItem()
                        item['symbol'] = site.select("h2/span[contains(@id, 'qteSymb')]/text()").extract()
                        item['eps']  = site.select("table/tbody/tr/td[contains(@class, 'rating')]/span/text()").extract()
                        strSymb = str(item['symbol'])
                        newSymb = strSymb.replace("[]","").replace("[u'","").replace("']","")
                        strEps = str(item['eps'])
                        newEps = strEps.replace("[]","").replace(" ","").replace("[u'\\r\\n","").replace("']","")
                        if not newSymb == "":
                                tempSymbol = newSymb
                        if not newEps == "":
                                tempEps = int(newEps)
                if not  tempEps < 85:
                        f.write("%s\t%s\n" % (tempSymbol, str(tempEps)))
                f.close()
#编码:utf-8
从scrapy.spider导入BaseSpider
从scrapy.selector导入HtmlXPathSelector
从scrapy.contrib.exporter进口CsvItemExporter
进口稀土
导入csv
导入URL解析
从stockscrape.items导入EPSItem
从itertools导入izip
epsScrape类(BaseSpider):
name=“eps”
允许的_域=[“investors.com”]
ifile=open('test.txt',“r”)
reader=csv.reader(ifile)
起始URL=[]
对于ifile中的行:
url=行。替换(“\n”,”)
如果url==“符号”:
持续
其他:
开始\u URL。追加(“http://research.investors.com/quotes/nyse-“+url+”.htm”)
ifile.close()
def解析(自我,响应):
tempSymbol=“”
节拍=10
f=打开(“eps.txt”,“a+”)
sel=HtmlXPathSelector(响应)
sites=sel.select(“//div”)
对于站点中的站点:
item=EPSItem()
item['symbol']=site.select(“h2/span[contains(@id,'qteSymb')]]/text()”).extract()
item['eps']=site.select(“table/tbody/tr/td[contains(@class,'rating'))]/span/text()”。extract()
strSymb=str(项目['symbol'])
newSymb=strSymb.replace(“[]”,“”)。replace(“[u]”,“”)。replace(“]”,“”)
strEps=str(项目['eps'])
newEps=strEps.replace(“[]”,“”)。replace(“”,“”)。replace(“[u'\\r\\n”,“”)。replace(“]”,“”)
如果不是newSymb==“”:
tempSymbol=newSymb
如果不是newEps==“”:
tempEps=int(newEps)
如果不是tempEps<85:
f、 写入(“%s\t%s\n”%(tempSymbol,str(tempEps)))
f、 关闭()
以下是我得到的错误:

$ scrapy crawl stockhighs
Traceback (most recent call last):
  File "/usr/bin/scrapy", line 4, in <module>
    execute()
  File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/usr/lib/pymodules/python2.7/scrapy/commands/crawl.py", line 47, in run
    crawler = self.crawler_process.create_crawler()
  File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 88, in create_crawler
    self.crawlers[name] = Crawler(self.settings)
  File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 26, in __init__
    self.spiders = spman_cls.from_crawler(self)
  File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 35, in from_crawler
    sm = cls.from_settings(crawler.settings)
  File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 31, in from_settings
    return cls(settings.getlist('SPIDER_MODULES'))
  File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 22, in __init__
    for module in walk_modules(name):
  File "/usr/lib/pymodules/python2.7/scrapy/utils/misc.py", line 66, in walk_modules
    submod = __import__(fullpath, {}, {}, [''])
  File "/home/bwisdom/scrapy/stockscrape/spiders/epsRating.py", line 11, in <module>
    class epsScrape(BaseSpider):
  File "/home/bwisdom/scrapy/stockscrape/spiders/epsRating.py", line 14, in epsScrape
    ifile = open('test.txt', "r")
IOError: [Errno 2] No such file or directory: 'test.txt'
$scrapy crawl stockhighs
回溯(最近一次呼叫最后一次):
文件“/usr/bin/scrapy”,第4行,在
执行()
文件“/usr/lib/pymodules/python2.7/scrapy/cmdline.py”,执行中的第142行
_运行\u打印\u帮助(解析器、\u运行\u命令、cmd、args、opts)
文件“/usr/lib/pymodules/python2.7/scrapy/cmdline.py”,第88行,在“运行”和“打印”帮助中
func(*a,**千瓦)
文件“/usr/lib/pymodules/python2.7/scrapy/cmdline.py”,第149行,在_run_命令中
cmd.run(参数、选项)
文件“/usr/lib/pymodules/python2.7/scrapy/commands/crawl.py”,第47行,运行中
crawler=self.crawler\u进程.create\u crawler()
文件“/usr/lib/pymodules/python2.7/scrapy/crawler.py”,第88行,在create\u crawler中
self.Crawler[name]=爬虫程序(self.settings)
文件“/usr/lib/pymodules/python2.7/scrapy/crawler.py”,第26行,在__
self.spider=spman\u cls.来自爬虫(self)
文件“/usr/lib/pymodules/python2.7/scrapy/spidermanager.py”,第35行,来自爬虫程序
sm=cls.from_设置(爬虫程序设置)
文件“/usr/lib/pymodules/python2.7/scrapy/spidermanager.py”,第31行,在from_设置中
返回cls(settings.getlist('SPIDER_MODULES'))
文件“/usr/lib/pymodules/python2.7/scrapy/spidermanager.py”,第22行,在__
对于walk_模块中的模块(名称):
文件“/usr/lib/pymodules/python2.7/scrapy/utils/misc.py”,第66行,在walk_模块中
submod=uuu导入(完整路径,{},{},['''])
文件“/home/bwisdom/scrapy/stockscrape/spiders/epsRating.py”,第11行,在
epsScrape类(BaseSpider):
文件“/home/bwisdom/scrapy/stockscrape/spider/epsRating.py”,第14行,在epssrave中
ifile=open('test.txt',“r”)
IOError:[Errno 2]没有这样的文件或目录:“test.txt”
我修复它的方法是创建一个空白的test.txt文件。现在我知道我在第一个spider中设置了一个“w”,所以它不会打开一个新文件,但是即使我设置了“w+”或“a+”,它也不会工作,直到我创建了空的test.txt。一旦我创建了第一个,第二个也会像它应该的那样运行


我想我感到困惑的是,为什么它在尝试运行第一个爬行器时会调用第二个爬行器。

您需要告诉我们爬行器在做什么,以及存在什么依赖关系-给我们一个简单的代码工作示例。运行第一个刮片器并不是调用第二个刮片器。在尝试运行第一个刮板时,scrapy正在加载所有模块,然后选择要运行的模块。但由于第二个有错误,整个过程都出错了。一旦你修复了错误,一切看起来都很好。好的,那么模块是如何工作的呢?如果我删除第二个spider,它会不会说没有epsrating.py而不是上面的?