Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/oop/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Mysql管道正在加载,但无法在scrapy中工作_Python_Scrapy - Fatal编程技术网

Python Mysql管道正在加载,但无法在scrapy中工作

Python Mysql管道正在加载,但无法在scrapy中工作,python,scrapy,Python,Scrapy,我正在尝试加载第二条管道,以便将条目写入mysql数据库。在日志中,我看到他正在加载,但之后什么也没有发生。甚至没有日志记录。这是我的管道: # Mysql import sys import MySQLdb import hashlib from scrapy.exceptions import DropItem from scrapy.http import Request class MySQLStorePipeline(object): def __init__(self):

我正在尝试加载第二条管道,以便将条目写入mysql数据库。在日志中,我看到他正在加载,但之后什么也没有发生。甚至没有日志记录。这是我的管道:

# Mysql
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):
  def __init__(self):
    self.conn = MySQLdb.connect(host="localhost", user="***", passwd="***", db="***", charset="utf8", use_unicode=True)
    self.cursor = self.conn.cursor()

def process_item(self, item, spider):

        CurrentDateTime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        Md5Hash = hashlib.md5(item['link']).hexdigest()
        try:
                self.cursor.execute("""INSERT INTO apple (article_add_date, article_date, article_title, article_link, article_link_md5, article_summary, article_image_url, article_source) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)""", (CurrentDateTime, item['date'], item['title'], item['link'], Md5Hash, item['summary'], item['image'], item['sourcesite']))
                self.conn.commit()

        except MySQLdb.Error, e:
                print "Error %d: %s" % (e.args[0], e.args[1])

        return item
class CleanDateField(object):

        def process_item(self,item,spider):
                from dateutil import parser
                rawdate = item['date']

                #text replace per spider so parser can recognize better the datetime
                if spider.name == "macnn_com":
                        rawdate = rawdate.replace("updated","").strip()

                dt = parser.parse(rawdate)
                articledate = dt.strftime("%Y-%m-%d %H:%M:%S")
                item['date'] = articledate

                return item
这是我的日志:

 scrapy crawl macnn_com
2013-06-20 08:15:53+0200 [scrapy] INFO: Scrapy 0.16.4 started (bot: HungryFeed)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled item pipelines: MySQLStorePipeline, CleanDateField
2013-06-20 08:15:54+0200 [macnn_com] INFO: Spider opened
2013-06-20 08:15:54+0200 [macnn_com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com> (referer: None)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/> (referer: http://www.macnn.com)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Scraped from <200 http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/>
*** lot of scraping data ***
*** lot of scraping data ***
*** lot of scraping data ***
2013-06-20 08:15:56+0200 [macnn_com] INFO: Closing spider (finished)
2013-06-20 08:15:56+0200 [macnn_com] INFO: Dumping Scrapy stats:
        {'downloader/request_bytes': 5711,
         'downloader/request_count': 17,
         'downloader/request_method_count/GET': 17,
         'downloader/response_bytes': 281140,
         'downloader/response_count': 17,
         'downloader/response_status_count/200': 17,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2013, 6, 20, 6, 15, 56, 685286),
         'item_scraped_count': 16,
         'log_count/DEBUG': 39,
         'log_count/INFO': 4,
         'request_depth_max': 1,
         'response_received_count': 17,
         'scheduler/dequeued': 17,
         'scheduler/dequeued/memory': 17,
         'scheduler/enqueued': 17,
         'scheduler/enqueued/memory': 17,
         'start_time': datetime.datetime(2013, 6, 20, 6, 15, 54, 755766)}
2013-06-20 08:15:56+0200 [macnn_com] INFO: Spider closed (finished)
我是不是错过了什么

这是我的第一条管道:

# Mysql
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request

class MySQLStorePipeline(object):
  def __init__(self):
    self.conn = MySQLdb.connect(host="localhost", user="***", passwd="***", db="***", charset="utf8", use_unicode=True)
    self.cursor = self.conn.cursor()

def process_item(self, item, spider):

        CurrentDateTime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        Md5Hash = hashlib.md5(item['link']).hexdigest()
        try:
                self.cursor.execute("""INSERT INTO apple (article_add_date, article_date, article_title, article_link, article_link_md5, article_summary, article_image_url, article_source) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)""", (CurrentDateTime, item['date'], item['title'], item['link'], Md5Hash, item['summary'], item['image'], item['sourcesite']))
                self.conn.commit()

        except MySQLdb.Error, e:
                print "Error %d: %s" % (e.args[0], e.args[1])

        return item
class CleanDateField(object):

        def process_item(self,item,spider):
                from dateutil import parser
                rawdate = item['date']

                #text replace per spider so parser can recognize better the datetime
                if spider.name == "macnn_com":
                        rawdate = rawdate.replace("updated","").strip()

                dt = parser.parse(rawdate)
                articledate = dt.strftime("%Y-%m-%d %H:%M:%S")
                item['date'] = articledate

                return item

您的CleanDateField管道是什么样子的?这个管道工作吗:a)单独工作,b)当你包括MySQLStorePipeline时?是的,这个管道工作起来很有魅力。我用我的第一条管道编辑了我的文章。管道代码中没有导入
datetime
import。另外,你能检查一下刮削(添加打印或记录)时是否调用了
process\u item
?听起来很愚蠢,因为“def”上的缩进不存在。我的编辑器在我使用vi检查之前没有显示这一点。@alecxe这是我在更正缩进后遇到的第一个错误,修复了它。现在它工作了。我几乎总是假设SO上的缩进错误是由粘贴错误引起的,而不是代码本身的实际错误。