Python 使用附加参数从mysql启动\u请求
我正在尝试使用以下方案对网站进行爬网: 我有一个mysql表,其中包含有关电影名称及其发行年份的信息。Scrapy spider在start_requests函数中获取这两个值,然后处理请求。search_in_filmweb函数分析响应和检查,结果包含与我从数据库获得的发布年份相同的发布年份 假设我的数据库中有以下值: 电影名称:威尼斯之死;发布年份:1971年 spider将请求发送为,然后在发布日期前选择正确的结果 我编写的spider工作正常,但只针对数据库中的一条特定记录,即BaseSpider。但是,当我试图通过从数据库中获取所有行来进行批量请求时,我遇到了一个错误:Python 使用附加参数从mysql启动\u请求,python,scrapy,Python,Scrapy,我正在尝试使用以下方案对网站进行爬网: 我有一个mysql表,其中包含有关电影名称及其发行年份的信息。Scrapy spider在start_requests函数中获取这两个值,然后处理请求。search_in_filmweb函数分析响应和检查,结果包含与我从数据库获得的发布年份相同的发布年份 假设我的数据库中有以下值: 电影名称:威尼斯之死;发布年份:1971年 spider将请求发送为,然后在发布日期前选择正确的结果 我编写的spider工作正常,但只针对数据库中的一条特定记录,即BaseS
2014-03-07 18:01:19+0100 [single] DEBUG: Crawled (200) <GET http://www.filmweb.pl/search?q=Death+in+Venice> (referer: None)
2014-03-07 18:01:19+0100 [single] ERROR: Spider error processing <GET http://www.filmweb.pl/search?q=Death+in+Venice>
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/Library/Python/2.7/site-packages/twisted/internet/task.py", line 638, in _tick
taskObj._oneWorkUnit()
File "/Library/Python/2.7/site-packages/twisted/internet/task.py", line 484, in _oneWorkUnit
result = next(self._iterator)
File "/Library/Python/2.7/site-packages/scrapy/utils/defer.py", line 57, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
--- <exception caught here> ---
File "/Library/Python/2.7/site-packages/scrapy/utils/defer.py", line 96, in iter_errback
yield next(it)
File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/offsite.py", line 23, in process_spider_output
for x in result:
File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Users/mikolajroszkowski/Desktop/python/scrapy_projects/filmweb_moviecus/filmweb_moviecus/spiders/single.py", line 37, in search_in_filmweb
yield Request("http://www.filmweb.pl"+item['link_from_search'][0], meta={'item': item}, callback=self.parse)
exceptions.IndexError: list index out of range
愚蠢的错误,应该是:
def start_requests(self):
conn = MySQLdb.connect(unix_socket = '/Applications/MAMP/tmp/mysql/mysql.sock', user='root', passwd='root', db='filmypodobne', host='localhost', charset="utf8", use_unicode=True)
cursor = conn.cursor()
cursor.execute("SELECT * FROM filmy_app_movies")
rows = cursor.fetchall()
for row in rows:
item = FilmwebItem()
item['movie_name'] = urllib.quote_plus(row[1])
item['id_db'] = row[0]
item['db_year'] = row[3]
yield Request("http://www.filmweb.pl/search?q="+item['movie_name'], meta={'item': item}, callback=self.search_in_filmweb)
这是您问题的解决方案,还是对原始问题的编辑?
def start_requests(self):
conn = MySQLdb.connect(unix_socket = '/Applications/MAMP/tmp/mysql/mysql.sock', user='root', passwd='root', db='filmypodobne', host='localhost', charset="utf8", use_unicode=True)
cursor = conn.cursor()
cursor.execute("SELECT * FROM filmy_app_movies")
rows = cursor.fetchall()
for row in rows:
item = FilmwebItem()
item['movie_name'] = urllib.quote_plus(row[1])
item['id_db'] = row[0]
item['db_year'] = row[3]
yield Request("http://www.filmweb.pl/search?q="+item['movie_name'], meta={'item': item}, callback=self.search_in_filmweb)