Python Mysql管道正在加载,但无法在scrapy中工作
我正在尝试加载第二条管道,以便将条目写入mysql数据库。在日志中,我看到他正在加载,但之后什么也没有发生。甚至没有日志记录。这是我的管道:Python Mysql管道正在加载,但无法在scrapy中工作,python,scrapy,Python,Scrapy,我正在尝试加载第二条管道,以便将条目写入mysql数据库。在日志中,我看到他正在加载,但之后什么也没有发生。甚至没有日志记录。这是我的管道: # Mysql import sys import MySQLdb import hashlib from scrapy.exceptions import DropItem from scrapy.http import Request class MySQLStorePipeline(object): def __init__(self):
# Mysql
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request
class MySQLStorePipeline(object):
def __init__(self):
self.conn = MySQLdb.connect(host="localhost", user="***", passwd="***", db="***", charset="utf8", use_unicode=True)
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
CurrentDateTime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
Md5Hash = hashlib.md5(item['link']).hexdigest()
try:
self.cursor.execute("""INSERT INTO apple (article_add_date, article_date, article_title, article_link, article_link_md5, article_summary, article_image_url, article_source) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)""", (CurrentDateTime, item['date'], item['title'], item['link'], Md5Hash, item['summary'], item['image'], item['sourcesite']))
self.conn.commit()
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
return item
class CleanDateField(object):
def process_item(self,item,spider):
from dateutil import parser
rawdate = item['date']
#text replace per spider so parser can recognize better the datetime
if spider.name == "macnn_com":
rawdate = rawdate.replace("updated","").strip()
dt = parser.parse(rawdate)
articledate = dt.strftime("%Y-%m-%d %H:%M:%S")
item['date'] = articledate
return item
这是我的日志:
scrapy crawl macnn_com
2013-06-20 08:15:53+0200 [scrapy] INFO: Scrapy 0.16.4 started (bot: HungryFeed)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Enabled item pipelines: MySQLStorePipeline, CleanDateField
2013-06-20 08:15:54+0200 [macnn_com] INFO: Spider opened
2013-06-20 08:15:54+0200 [macnn_com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-06-20 08:15:54+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com> (referer: None)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Crawled (200) <GET http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/> (referer: http://www.macnn.com)
2013-06-20 08:15:55+0200 [macnn_com] DEBUG: Scraped from <200 http://www.macnn.com/articles/13/06/19/compatibility.described.as.experimental/>
*** lot of scraping data ***
*** lot of scraping data ***
*** lot of scraping data ***
2013-06-20 08:15:56+0200 [macnn_com] INFO: Closing spider (finished)
2013-06-20 08:15:56+0200 [macnn_com] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 5711,
'downloader/request_count': 17,
'downloader/request_method_count/GET': 17,
'downloader/response_bytes': 281140,
'downloader/response_count': 17,
'downloader/response_status_count/200': 17,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2013, 6, 20, 6, 15, 56, 685286),
'item_scraped_count': 16,
'log_count/DEBUG': 39,
'log_count/INFO': 4,
'request_depth_max': 1,
'response_received_count': 17,
'scheduler/dequeued': 17,
'scheduler/dequeued/memory': 17,
'scheduler/enqueued': 17,
'scheduler/enqueued/memory': 17,
'start_time': datetime.datetime(2013, 6, 20, 6, 15, 54, 755766)}
2013-06-20 08:15:56+0200 [macnn_com] INFO: Spider closed (finished)
我是不是错过了什么
这是我的第一条管道:
# Mysql
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import DropItem
from scrapy.http import Request
class MySQLStorePipeline(object):
def __init__(self):
self.conn = MySQLdb.connect(host="localhost", user="***", passwd="***", db="***", charset="utf8", use_unicode=True)
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
CurrentDateTime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
Md5Hash = hashlib.md5(item['link']).hexdigest()
try:
self.cursor.execute("""INSERT INTO apple (article_add_date, article_date, article_title, article_link, article_link_md5, article_summary, article_image_url, article_source) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)""", (CurrentDateTime, item['date'], item['title'], item['link'], Md5Hash, item['summary'], item['image'], item['sourcesite']))
self.conn.commit()
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
return item
class CleanDateField(object):
def process_item(self,item,spider):
from dateutil import parser
rawdate = item['date']
#text replace per spider so parser can recognize better the datetime
if spider.name == "macnn_com":
rawdate = rawdate.replace("updated","").strip()
dt = parser.parse(rawdate)
articledate = dt.strftime("%Y-%m-%d %H:%M:%S")
item['date'] = articledate
return item
您的CleanDateField管道是什么样子的?这个管道工作吗:a)单独工作,b)当你包括MySQLStorePipeline时?是的,这个管道工作起来很有魅力。我用我的第一条管道编辑了我的文章。管道代码中没有导入
datetime
import。另外,你能检查一下刮削(添加打印或记录)时是否调用了process\u item
?听起来很愚蠢,因为“def”上的缩进不存在。我的编辑器在我使用vi检查之前没有显示这一点。@alecxe这是我在更正缩进后遇到的第一个错误,修复了它。现在它工作了。我几乎总是假设SO上的缩进错误是由粘贴错误引起的,而不是代码本身的实际错误。