Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/59.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/batch-file/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用scrapy时出现MySQL数据库错误_Python_Mysql_Scrapy - Fatal编程技术网

Python 使用scrapy时出现MySQL数据库错误

Python 使用scrapy时出现MySQL数据库错误,python,mysql,scrapy,Python,Mysql,Scrapy,我正在尝试将废弃的数据保存到MySQL数据库中。我的script.py是 # -*- coding: utf-8 -*- import scrapy import unidecode from scrapy.spiders import CrawlSpider, Rule from scrapy.linkextractors import LinkExtractor from lxml import html class ElementSpider(scrapy.Spider):

我正在尝试将废弃的数据保存到MySQL数据库中。我的script.py是

 # -*- coding: utf-8 -*-
import scrapy
import unidecode
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from lxml import html


class ElementSpider(scrapy.Spider):
    name = 'books'
    download_delay = 3
    allowed_domains = ["goodreads.com"]
    start_urls = ["https://www.goodreads.com/list/show/19793.I_Marked_My_Calendar_For_This_Book_s_Release",]

    rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//a[@class="next_page"]',)), callback="parse", follow= True),)

    def parse(self, response):
        for href in response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/tr/td[2]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href'):       
            full_url = response.urljoin(href.extract())
            print full_url
            yield scrapy.Request(full_url, callback = self.parse_books)
            break;


        next_page = response.xpath('.//a[@class="next_page"]/@href').extract()
        if next_page:
            next_href = next_page[0]
            next_page_url = 'https://www.goodreads.com' + next_href
            print next_page_url
            request = scrapy.Request(next_page_url, self.parse)
            yield request

    def parse_books(self, response):
        yield{
            'url': response.url,
            'title':response.xpath('//div[@id="metacol"]/h1[@class="bookTitle"]/text()').extract(),
            'link':response.xpath('//div[@id="metacol"]/h1[@class="bookTitle"]/a/@href').extract(),
        } 
而pipeline.py是

   # -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html


import MySQLdb
import hashlib
from scrapy.exceptions import DropItem

from scrapy.http import Request
import sys

class SQLStore(object):
    def __init__(self):
        self.conn = MySQLdb.connect("localhost","root","","books" )
        self.cursor = self.conn.cursor()
        print "connected to DB"

    def process_item(self, item, spider):
        print "hi"

        try:
            self.cursor.execute("""INSERT INTO books_data(next_page_url) VALUES (%s)""", (item['url']))
            self.conn.commit()

        except Exception, e:
            print e

当我运行脚本时,没有错误。Spider运行良好,但我认为光标不指向进程项。即使它没有打印hi。

您的方法签名是错误的,它也应该包含项和蜘蛛参数:

process_item(self, item, spider)
您还需要在settings.py文件中设置管道:

您的语法也不正确,需要传递元组:

  self.cursor.execute("""INSERT INTO books_data(next_page_url) VALUES (%s)""", 
    (item['url'],) # <- add ,
self.cursor.execute(““”插入书籍数据(下一页\u url)值(%s)”,

(item['url'],)#已经尝试过了,但没有效果。我在setting.py中添加了管道,如下所示
item_PIPELINES={'test1.PIPELINES.SQLStore':300,}
在piplines目录中的init.py文件中有什么?还有
process_项目(self、item、spider)
?那么scrapy是如何找到您的SQLStore管道的?我的意思是,您的文件实际上被称为管道还是管道?您的问题中有管道,上面有管道。还有,您从哪里得到任何项目?当我在
def\uu init\uuuuuuuuuuuuuuuo(self)中打印内容时,请查看my pipeline.py:
以及在
def process\uuuuuuuuuo项目(self)中打印时:
它不打印任何内容。表示
def process\u项(自身):
不可调用。
  self.cursor.execute("""INSERT INTO books_data(next_page_url) VALUES (%s)""", 
    (item['url'],) # <- add ,