Python scraper在解析1链接后结束_Python_Web Scraping

Python scraper在解析1链接后结束

python web-scraping

Python scraper在解析1链接后结束,python,web-scraping,Python,Web Scraping,我一直在写这个网页刮板，我不明白为什么它只是结束。代码如下： import scrapy, MySQLdb, urllib from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors import LinkExtractor from scrapy import Request class MyItems(scrapy.Item): topLinks = scrapy.

我一直在写这个网页刮板，我不明白为什么它只是结束。代码如下：

import scrapy, MySQLdb, urllib
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy import Request


class MyItems(scrapy.Item):
    topLinks = scrapy.Field()
    artists = scrapy.Field()

class mp3Spider(CrawlSpider):
    name = 'mp3_scraper'
    allowed_domains = [
        'example.com'
    ]
    start_urls = [
        'http://www.example.com'
    ]

    def __init__(self, *a, **kw):
        super(mp3Spider, self).__init__(*a, **kw)

        self.item = MyItems()

    def parse(self, response):
        f = open('topLinks', 'w')
        self.item['topLinks'] = response.xpath("//div[contains(@class, 'en')]/a[contains(@class, 'hash')]/@href").extract()

        for x in range(len(self.item['topLinks'])):
            self.item['topLinks'][x] = 'http://www.example.com' + self.item['topLinks'][x]

        for x in range(len(self.item['topLinks'])):
            f.write(format(self.item['topLinks'][x]).encode('utf-8')+ '\n')
            yield Request(url=self.item['topLinks'][x], callback=self.parse_artists)

    def parse_artists(self, response):
        f = open('artists', 'w')
        self.item['artists'] = response.xpath("//ul[contains(@class, 'artist_list')]/li/a/text()").extract()

        for x in range(len(self.item['artists'])):
            f.write(format(self.item['artists'][x]).encode('utf-8') + '\n')

所以两个解析函数都得到了我需要的信息，但是parse_美术师只解析一个链接。parse函数获取我需要的所有链接，我可以看到它是这样做的，因为我将它们打印到了一个文件中。因此，假设它抓取链接：example.com/artists/a、example.com/artists/b等。Parse artists将只抓取example.com/artists/a，然后停止。任何帮助都将不胜感激，谢谢-萨姆

编辑：输出日志-

C:\Python27\python.exe C:/Users/sam/PycharmProjects/mp3_scraper/mp3_scraper/mp3_scraper/main.py
2014-09-13 12:28:24-0400 [scrapy] INFO: Scrapy 0.24.2 started (bot: mp3_scraper)
2014-09-13 12:28:24-0400 [scrapy] INFO: Optional features available: ssl, http11
2014-09-13 12:28:24-0400 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'mp3_scraper.spiders', 'SPIDER_MODULES': ['mp3_scraper.spiders'], 'BOT_NAME': 'mp3_scraper'}
2014-09-13 12:28:24-0400 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-09-13 12:28:25-0400 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-09-13 12:28:25-0400 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-09-13 12:28:25-0400 [scrapy] INFO: Enabled item pipelines: 
2014-09-13 12:28:25-0400 [mp3_scraper] INFO: Spider opened
2014-09-13 12:28:25-0400 [mp3_scraper] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-09-13 12:28:25-0400 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2014-09-13 12:28:25-0400 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/> (referer: None)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/z/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/0..9/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/w/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/x/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/u/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/q/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/v/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/y/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:26-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/t/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/o/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/p/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/r/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/n/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/s/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/l/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/h/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/k/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/i/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/g/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/m/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:27-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/j/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:28-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/f/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:28-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/e/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:28-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/c/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:28-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/d/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:28-0400 [mp3_scraper] DEBUG: Crawled (200) <GET http://www.myfreemp3.cc/artists/b/> (referer: http://www.myfreemp3.cc/artists/)
2014-09-13 12:28:28-0400 [mp3_scraper] INFO: Closing spider (finished)
2014-09-13 12:28:28-0400 [mp3_scraper] INFO: Dumping Scrapy stats:
    {'downloader/request_bytes': 10106,
     'downloader/request_count': 27,
     'downloader/request_method_count/GET': 27,
     'downloader/response_bytes': 887850,
     'downloader/response_count': 27,
     'downloader/response_status_count/200': 27,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2014, 9, 13, 16, 28, 28, 908000),
     'log_count/DEBUG': 29,
     'log_count/INFO': 7,
     'request_depth_max': 1,
     'response_received_count': 27,
     'scheduler/dequeued': 27,
     'scheduler/dequeued/memory': 27,
     'scheduler/enqueued': 27,
     'scheduler/enqueued/memory': 27,
     'start_time': datetime.datetime(2014, 9, 13, 16, 28, 25, 315000)}
2014-09-13 12:28:28-0400 [mp3_scraper] INFO: Spider closed (finished)

Process finished with exit code 0

以w模式打开艺术家文件，如果该文件已存在，则会截断该文件。因此，在爬行器完成后，文件中只保留最后一个刮取的项目

您应该打开附加模式a的文件以修复此问题：

def parse_artists(self, response):
    f = open('artists', 'a')
    ...

您能否将执行spider时Scrapy生成的日志输出添加到问题中？