Python 使用附加参数从mysql启动\u请求

Python 使用附加参数从mysql启动\u请求,python,scrapy,Python,Scrapy,我正在尝试使用以下方案对网站进行爬网: 我有一个mysql表,其中包含有关电影名称及其发行年份的信息。Scrapy spider在start_requests函数中获取这两个值,然后处理请求。search_in_filmweb函数分析响应和检查,结果包含与我从数据库获得的发布年份相同的发布年份 假设我的数据库中有以下值: 电影名称:威尼斯之死;发布年份:1971年 spider将请求发送为,然后在发布日期前选择正确的结果 我编写的spider工作正常,但只针对数据库中的一条特定记录,即BaseS

我正在尝试使用以下方案对网站进行爬网:

我有一个mysql表,其中包含有关电影名称及其发行年份的信息。Scrapy spider在start_requests函数中获取这两个值,然后处理请求。search_in_filmweb函数分析响应和检查,结果包含与我从数据库获得的发布年份相同的发布年份

假设我的数据库中有以下值:

电影名称:威尼斯之死;发布年份:1971年

spider将请求发送为,然后在发布日期前选择正确的结果

我编写的spider工作正常,但只针对数据库中的一条特定记录,即BaseSpider。但是,当我试图通过从数据库中获取所有行来进行批量请求时,我遇到了一个错误:

2014-03-07 18:01:19+0100 [single] DEBUG: Crawled (200) <GET http://www.filmweb.pl/search?q=Death+in+Venice> (referer: None)
2014-03-07 18:01:19+0100 [single] ERROR: Spider error processing <GET http://www.filmweb.pl/search?q=Death+in+Venice>
    Traceback (most recent call last):
      File "/Library/Python/2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/Library/Python/2.7/site-packages/twisted/internet/task.py", line 638, in _tick
        taskObj._oneWorkUnit()
      File "/Library/Python/2.7/site-packages/twisted/internet/task.py", line 484, in _oneWorkUnit
        result = next(self._iterator)
      File "/Library/Python/2.7/site-packages/scrapy/utils/defer.py", line 57, in <genexpr>
        work = (callable(elem, *args, **named) for elem in iterable)
    --- <exception caught here> ---
      File "/Library/Python/2.7/site-packages/scrapy/utils/defer.py", line 96, in iter_errback
        yield next(it)
      File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/offsite.py", line 23, in process_spider_output
        for x in result:
      File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr>
        return (_set_referer(r) for r in result or ())
      File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr>
        return (r for r in result or () if _filter(r))
      File "/Library/Python/2.7/site-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr>
        return (r for r in result or () if _filter(r))
      File "/Users/mikolajroszkowski/Desktop/python/scrapy_projects/filmweb_moviecus/filmweb_moviecus/spiders/single.py", line 37, in search_in_filmweb
        yield Request("http://www.filmweb.pl"+item['link_from_search'][0], meta={'item': item}, callback=self.parse)
    exceptions.IndexError: list index out of range

愚蠢的错误,应该是:

def start_requests(self):
    conn = MySQLdb.connect(unix_socket = '/Applications/MAMP/tmp/mysql/mysql.sock', user='root', passwd='root', db='filmypodobne', host='localhost', charset="utf8", use_unicode=True)
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM filmy_app_movies")
    rows = cursor.fetchall()
    for row in rows:
        item = FilmwebItem()
        item['movie_name'] = urllib.quote_plus(row[1])
        item['id_db'] = row[0]
        item['db_year'] = row[3]
        yield Request("http://www.filmweb.pl/search?q="+item['movie_name'], meta={'item': item}, callback=self.search_in_filmweb)

这是您问题的解决方案,还是对原始问题的编辑?
def start_requests(self):
    conn = MySQLdb.connect(unix_socket = '/Applications/MAMP/tmp/mysql/mysql.sock', user='root', passwd='root', db='filmypodobne', host='localhost', charset="utf8", use_unicode=True)
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM filmy_app_movies")
    rows = cursor.fetchall()
    for row in rows:
        item = FilmwebItem()
        item['movie_name'] = urllib.quote_plus(row[1])
        item['id_db'] = row[0]
        item['db_year'] = row[3]
        yield Request("http://www.filmweb.pl/search?q="+item['movie_name'], meta={'item': item}, callback=self.search_in_filmweb)