Python Mysql管道正在加载，但无法在scrapy中工作_Python_Scrapy

Python Mysql管道正在加载，但无法在scrapy中工作

python scrapy

Python Mysql管道正在加载，但无法在scrapy中工作,python,scrapy,Python,Scrapy,我正在尝试加载第二条管道，以便将条目写入mysql数据库。在日志中，我看到他正在加载，但之后什么也没有发生。甚至没有日志记录。这是我的管道： # Mysql import sys import MySQLdb import hashlib from scrapy.exceptions import DropItem from scrapy.http import Request class MySQLStorePipeline(object): def __init__(self):

我正在尝试加载第二条管道，以便将条目写入mysql数据库。在日志中，我看到他正在加载，但之后什么也没有发生。甚至没有日志记录。这是我的管道：

# Mysql
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):
  def __init__(self):
    self.conn = MySQLdb.connect(host="localhost", user="***", passwd="***", db="***", charset="utf8", use_unicode=True)
    self.cursor = self.conn.cursor()

def process_item(self, item, spider):

        CurrentDateTime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        Md5Hash = hashlib.md5(item['link']).hexdigest()
        try:
                self.cursor.execute("""INSERT INTO apple (article_add_date, article_date, article_title, article_link, article_link_md5, article_summary, article_image_url, article_source) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)""", (CurrentDateTime, item['date'], item['title'], item['link'], Md5Hash, item['summary'], item['image'], item['sourcesite']))
                self.conn.commit()

        except MySQLdb.Error, e:
                print "Error %d: %s" % (e.args[0], e.args[1])

        return item

class CleanDateField(object):

        def process_item(self,item,spider):
                from dateutil import parser
                rawdate = item['date']

                #text replace per spider so parser can recognize better the datetime
                if spider.name == "macnn_com":
                        rawdate = rawdate.replace("updated","").strip()

                dt = parser.parse(rawdate)
                articledate = dt.strftime("%Y-%m-%d %H:%M:%S")
                item['date'] = articledate

                return item

这是我的日志：

 scrapy crawl macnn_com
2013-06-20 08:15:53+0200 [scrapy] INFO: Scrapy 0.16.4 started (bot: HungryFeed)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled item pipelines: MySQLStorePipeline, CleanDateField
2013-06-20 08:15:54+0200 [macnn_com] INFO: Spider opened
2013-06-20 08:15:54+0200 [macnn_com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com> (referer: None)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/> (referer: http://www.macnn.com)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Scraped from <200 http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/>
*** lot of scraping data ***
*** lot of scraping data ***
*** lot of scraping data ***
2013-06-20 08:15:56+0200 [macnn_com] INFO: Closing spider (finished)
2013-06-20 08:15:56+0200 [macnn_com] INFO: Dumping Scrapy stats:
        {'downloader/request_bytes': 5711,
         'downloader/request_count': 17,
         'downloader/request_method_count/GET': 17,
         'downloader/response_bytes': 281140,
         'downloader/response_count': 17,
         'downloader/response_status_count/200': 17,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2013, 6, 20, 6, 15, 56, 685286),
         'item_scraped_count': 16,
         'log_count/DEBUG': 39,
         'log_count/INFO': 4,
         'request_depth_max': 1,
         'response_received_count': 17,
         'scheduler/dequeued': 17,
         'scheduler/dequeued/memory': 17,
         'scheduler/enqueued': 17,
         'scheduler/enqueued/memory': 17,
         'start_time': datetime.datetime(2013, 6, 20, 6, 15, 54, 755766)}
2013-06-20 08:15:56+0200 [macnn_com] INFO: Spider closed (finished)

我是不是错过了什么

这是我的第一条管道：

# Mysql
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):
  def __init__(self):
    self.conn = MySQLdb.connect(host="localhost", user="***", passwd="***", db="***", charset="utf8", use_unicode=True)
    self.cursor = self.conn.cursor()

def process_item(self, item, spider):

        CurrentDateTime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        Md5Hash = hashlib.md5(item['link']).hexdigest()
        try:
                self.cursor.execute("""INSERT INTO apple (article_add_date, article_date, article_title, article_link, article_link_md5, article_summary, article_image_url, article_source) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)""", (CurrentDateTime, item['date'], item['title'], item['link'], Md5Hash, item['summary'], item['image'], item['sourcesite']))
                self.conn.commit()

        except MySQLdb.Error, e:
                print "Error %d: %s" % (e.args[0], e.args[1])

        return item

class CleanDateField(object):

        def process_item(self,item,spider):
                from dateutil import parser
                rawdate = item['date']

                #text replace per spider so parser can recognize better the datetime
                if spider.name == "macnn_com":
                        rawdate = rawdate.replace("updated","").strip()

                dt = parser.parse(rawdate)
                articledate = dt.strftime("%Y-%m-%d %H:%M:%S")
                item['date'] = articledate

                return item

您的CleanDateField管道是什么样子的？这个管道工作吗：a）单独工作，b）当你包括MySQLStorePipeline时？是的，这个管道工作起来很有魅力。我用我的第一条管道编辑了我的文章。管道代码中没有导入

datetime

import。另外，你能检查一下刮削（添加打印或记录）时是否调用了

process\u item

？听起来很愚蠢，因为“def”上的缩进不存在。我的编辑器在我使用vi检查之前没有显示这一点。@alecxe这是我在更正缩进后遇到的第一个错误，修复了它。现在它工作了。我几乎总是假设SO上的缩进错误是由粘贴错误引起的，而不是代码本身的实际错误。