Python 将多个项目从Scrapy存储到Mysql时出现问题_Python_Mysql_Scrapy

Python 将多个项目从Scrapy存储到Mysql时出现问题

python mysql scrapy

Python 将多个项目从Scrapy存储到Mysql时出现问题,python,mysql,scrapy,Python,Mysql,Scrapy,所以，我有一个让我发疯的问题，我试图通过管道将刮下的项目存储到MySQL中，但我无法做到如果我只存储一个项目，我可以这样做，但当我添加第二个项目时，我会出现一个奇怪的错误 Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '), 1)' at line 2

所以，我有一个让我发疯的问题，我试图通过管道将刮下的项目存储到MySQL中，但我无法做到

如果我只存储一个项目，我可以这样做，但当我添加第二个项目时，我会出现一个奇怪的错误

Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '), 1)' at line 2

因此，我得到了上面的错误，我在pipelines.py中的代码是：

class DropToDb(object):
    def __init__(self):
        self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Test")
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        try:
            self.cursor.execute("""
                          INSERT INTO Main (url, domain_id)
                          VALUES (%s, %s)
                    """, (item['url'], item['domain_id']))

            self.conn.commit()


        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])

        return item

如果我删除一个表和一个项目，它就会工作得很好，如下所示

class DropToDb(object):
    def __init__(self):
        self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Test")
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        try:
            self.cursor.execute("""
                          INSERT INTO Main (url)
                          VALUES (%s)
                    """, (item['url']))

            self.conn.commit()


        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])

        return item

我的碎片文件看起来像：

if datematch:
    item['link_title'] = ogtitle
    item['link_description'] = response.xpath('//meta[@property="og:description"]/@content').extract()
    item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract(),
    yield item

上面还有更多的项目，但我只是想举个例子

有人能帮我摆脱这一切吗

我的蜘蛛文件：

import scrapy
import MySQLdb
from MySQLdb.cursors import SSCursor
from scrapy.http import Request
import re
from Maintoo.items import MaintooSpider2Item
from scrapy.exceptions import DropItem
import datetime
class Maintoospider2Spider(scrapy.Spider):
    name = "MaintooSpider2"

    #start_urls = readdomainsfromdb()

    def start_requests(self):
        for domain_id, url, id_sitemap_links in readdomainsfromdb():
            yield Request(
                url,
                callback=self.parse,
                meta={
                    'domain_id': domain_id,
                    'id_sitemap_links': id_sitemap_links
                },
                errback=self.error
            )

    def error(self):
        pass

    def parse(self, response):
        domainid = response.meta['domain_id']
        id_sitemap_links = response.meta['id_sitemap_links']
        #updateid(id_sitemap_links)
        ogtitle = response.xpath('//meta[@property="og:title"]/@content').extract()
        isporn = response.xpath('//meta[@content="RTA-5042-1996-1400-1577-RTA"]').extract()
        datematch = re.findall(r'(content="2015|2016")', response.body, re.IGNORECASE | re.DOTALL)
        item = MaintooSpider2Item()
        if '/tag/' in response.url:
            raise DropItem
        if isporn:
            updateporn(domainid)
            raise DropItem

        if datematch:
            item['link_title'] = ogtitle
            item['link_description'] = response.xpath('//meta[@property="og:description"]/@content').extract()
            item['link_locale'] = response.xpath('//meta[@property="og:locale"]/@content').extract()
            item['link_type'] = response.xpath('//meta[@property="og:type"]/@content').extract()
            item['link_url'] = response.xpath('//meta[@property="og:url"]/@content').extract()
            item['link_site_name'] = response.xpath('//meta[@property="og:site_name"]/@content').extract()
            item['link_article_tag'] = response.xpath('//meta[@property="article:tag"]/@content').extract()
            item['link_article_section'] = response.xpath('//meta[@property="article:section"]/@content').extract()
            item['link_article_published_time'] = response.xpath('//meta[@property="article:published_time"]/@content').extract()
            item['link_meta_keywords'] = response.xpath('//meta[@name="keywords"]/@content').extract()
            item['link_publisher'] = response.xpath('//meta[@property="article:publisher"]/@content').extract()
            item['link_article_author'] = response.xpath('//meta[@property="article:author"]/@content').extract()
            item['link_twitter_card'] = response.xpath('//meta[@name="twitter:card"]/@content').extract()
            item['link_twitter_description'] = response.xpath('//meta[@name="twitter:description"]/@content').extract()
            item['link_twitter_title'] = response.xpath('//meta[@name="twitter:title"]/@content').extract()
            item['link_twitter_image'] = response.xpath('//meta[@name="twitter:image"]/@content').extract()
            item['link_facebook_app_id'] = response.xpath('//meta[@property="fb:app_id"]/@content').extract()
            item['link_facebook_page_admins'] = response.xpath('//meta[@property="fb:admins"]/@content').extract()
            item['link_rss'] = response.xpath('//meta[@rel="alternate"]/@href').extract()
            item['link_twitter_image_source'] = response.xpath('//meta[@name="twitter:image:src"]/@content').extract()
            item['link_twitter_site'] = response.xpath('//meta[@name="twitter:site"]/@content').extract()
            item['link_twitter_url'] = response.xpath('//meta[@name="twitter:url"]/@content').extract()
            item['link_twitter_creator'] = response.xpath('//meta[@name="twitter:creator"]/@content').extract()
            item['link_apple_app'] = response.xpath('//meta[@name="apple-itunes-app"]/@content').extract()
            item['link_facebook_video'] = response.xpath('//meta[@property="og:video"]/@content').extract()
            item['link_facebook_page_id'] = response.xpath('//meta[@name="fb:page_id"]/@content').extract()
            item['link_id'] = response.xpath('//link[@rel="publisher"]/@href').extract()
            item['link_image'] = response.xpath('//meta[@property="og:image"]/@content').extract()
            item['url'] = response.url
            item['domain_id'] = domainid
            item['crawled_date'] = datetime.datetime.now().isoformat()
            yield item

我的新管道文件：

class dropifdescription(object):

    def process_item(self, item, spider):

        # to test if only "job_id" is empty,
        # change to:
        # if not(item["job_id"]):
        if not(item["link_title"]):
            raise DropItem()
        else:
            return item

class DropToDb(object):
    def __init__(self):
        self.conn = MySQLdb.connect(host="localhost", user="root", passwd="root", db="Maintoo",  charset="utf8", use_unicode=True)
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        try:
            self.cursor.execute("""
                              INSERT INTO Main (url, domain_id, link_title) VALUES (%s, %s, %s)""", (item['url'], item['domain_id'], item['link_title']))

            self.conn.commit()


        except MySQLdb.Error, e:
            print "Error %d: %s" % (e.args[0], e.args[1])

        return item

我的设置文件：

ITEM_PIPELINES = {
    'Maintoo.pipelines.dropifdescription': 200,
    'Maintoo.pipelines.DropToDb': 300,
}

问题出在你的蜘蛛身上

item['link\u locale']=response.xpath（'//meta[@property=“og:locale”]/@content'）.extract（），

请参见结尾处的

，

-这将使您的

项['link\u locale']

成为一个元组，最终会中断您的SQL查询。删除逗号

除此之外，您应该使用而不是使用常规的

extract（）

来提取单个值，而不是列表。

您的回答解决了我两项的问题，一旦我添加了另一项，问题仍然是一样的。def process_item（self，item，spider）：try:self.cursor.execute（““插入到主（url，域id，链接标题）值（%s，%s）”，（item['url']，item['domain_id']，item['link_title']）self.conn.commit（）我的代码怎么了。。。问题出在哪里？@BesnikHajredini你能发布你的完整蜘蛛吗（编辑问题并粘贴到那里）？谢谢。到目前为止，我添加了完整的spider文件和设置文件。。。我希望你能帮我解决这个问题，因为我被卡住了：）@BesnikHajredini好的，添加了一个关于

extract\u first（）

extract（）

-检查一下。谢谢