Python 3.x 如何在数据库中保存刮取的数据?

Python 3.x 如何在数据库中保存刮取的数据?,python-3.x,scrapy,mysql-workbench,Python 3.x,Scrapy,Mysql Workbench,我试图在db中保存刮取的数据,但被卡住了 首先,我已将刮取的数据保存在csv文件中,并使用glob库查找最新的csv,并将该csv的数据上载到数据库中- 我不确定我在这里做错了什么,请查找代码和错误 我已经在db中创建了表yahoo_数据,其列名与csv和我的代码输出的列名相同 import scrapy from scrapy.http import Request import MySQLdb import os import csv import glob class YahooScra

我试图在db中保存刮取的数据,但被卡住了

首先,我已将刮取的数据保存在csv文件中,并使用glob库查找最新的csv,并将该csv的数据上载到数据库中-

我不确定我在这里做错了什么,请查找代码和错误 我已经在db中创建了表yahoo_数据,其列名与csv和我的代码输出的列名相同

import scrapy
from scrapy.http import Request
import MySQLdb
import os
import csv
import glob

class YahooScrapperSpider(scrapy.Spider):
    name = 'yahoo_scrapper'
    allowed_domains = ['in.news.yahoo.com']
    start_urls = ['http://in.news.yahoo.com/']

    def parse(self, response):
        news_url=response.xpath('//*[@class="Mb(5px)"]/a/@href').extract()
        for url in news_url:
            absolute_url=response.urljoin(url)
            yield Request (absolute_url,callback=self.parse_text)

    def parse_text(self,response):
        Title=response.xpath('//meta[contains(@name,"twitter:title")]/@content').extract_first()
        # response.xpath('//*[@name="twitter:title"]/@content').extract_first(),this also works
        Article= response.xpath('//*[@class="canvas-atom canvas-text Mb(1.0em) Mb(0)--sm Mt(0.8em)--sm"]/text()').extract()
        yield {'Title':Title,
               'Article':Article}

    def close(self, reason):
        csv_file = max(glob.iglob('*.csv'), key=os.path.getctime)
        mydb = MySQLdb.connect(host='localhost',
                               user='root',
                               passwd='prasun',
                               db='books')
        cursor = mydb.cursor()
        csv_data = csv.reader(csv_file)

        row_count = 0
        for row in csv_data:
            if row_count != 0:
                cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
            row_count += 1

        mydb.commit()
        cursor.close()
获取此错误

ana. It should be directed not to disrespect the Sikh community and hurt its sentiments by passing such arbitrary and uncalled for orders," said Badal.', 'The SAD president also "brought it to the notice of the Haryana chief minister that Article 25 of the constitution safeguarded the rights of all citizens to profess and practices the tenets of their faith."', '"Keeping these facts in view I request you to direct the Haryana Public Service Commission to rescind its notification and allow Sikhs as well as candidates belonging to other religions to sport symbols of their faith during all examinations," said Badal. (ANI)']}
2019-04-01 16:49:41 [scrapy.core.engine] INFO: Closing spider (finished)
2019-04-01 16:49:41 [scrapy.extensions.feedexport] INFO: Stored csv feed (25 items) in: items.csv
2019-04-01 16:49:41 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method YahooScrapperSpider.close of <YahooScrapperSpider 'yahoo_scrapper' at 0x2c60f07bac8>>
Traceback (most recent call last):
  File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\MySQLdb\cursors.py", line 201, in execute
    query = query % args
TypeError: not enough arguments for format string

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\twisted\internet\defer.py", line 151, in maybeDeferred
    result = f(*args, **kw)
  File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "C:\Users\prasun.j\Desktop\scrapping\scrapping\spiders\yahoo_scrapper.py", line 44, in close
    cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
  File "C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site-packages\MySQLdb\cursors.py", line 203, in execute
    raise ProgrammingError(str(m))
MySQLdb._exceptions.ProgrammingError: not enough arguments for format string
2019-04-01 16:49:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 7985,
 'downloader/request_count': 27,
 'downloader/request_method_count/GET': 27,
 'downloader/response_bytes': 2148049,
 'downloader/response_count': 27,
 'downloader/response_status_count/200': 26,
 'downloader/response_status_count/301': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 4, 1, 11, 19, 41, 350717),
 'item_scraped_count': 25,
 'log_count/DEBUG': 53,
 'log_count/ERROR': 1,
 'log_count/INFO': 8,
 'request_depth_max': 1,
 'response_received_count': 26,
 'scheduler/dequeued': 27,
 'scheduler/dequeued/memory': 27,
 'scheduler/enqueued': 27,
 'scheduler/enqueued/memory': 27,
 'start_time': datetime.datetime(2019, 4, 1, 11, 19, 36, 743594)}
2019-04-01 16:49:41 [scrapy.core.engine] INFO: Spider closed (finished)
安娜。巴达尔说:“应该指示它不要通过这种武断和不必要的命令来不尊重锡克社区,伤害锡克社区的感情。”使哈里亚纳邦首席部长注意到,《宪法》第25条保障所有公民宣称和实践其信仰原则的权利考虑到这些事实,我请求你指示哈里亚纳公共服务委员会撤销其通知,允许锡克教徒以及其他宗教的候选人在所有考试中展示他们信仰的象征,”巴达尔(ANI)说 2019-04-01 16:49:41[刮屑芯发动机]信息:关闭卡盘(已完成) 2019-04-01 16:49:41[scrapy.extensions.feedexport]信息:存储在:items.csv中的csv提要(25项) 2019-04-01 16:49:41[scrapy.utils.signal]错误:在信号处理器上捕获到错误: 回溯(最近一次呼叫最后一次): 文件“C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site packages\MySQLdb\cursors.py”,第201行,在execute中 query=查询%args TypeError:格式字符串的参数不足 在处理上述异常期间,发生了另一个异常: 回溯(最近一次呼叫最后一次): 文件“C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site packages\twisted\internet\defer.py”,第151行,格式为maybeDeferred 结果=f(*参数,**kw) 文件“C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site packages\pydispatch\robustapply.py”,第55行,在robustapply中 返回接收器(*参数,**已命名) 文件“C:\Users\prasun.j\Desktop\scrasting\scrasting\spiders\yahoo\u scraster.py”,第44行,关闭 cursor.execute('INSERT IGNORE INTO yahoo_data(Title,Article)value(%s,%s)',row) 文件“C:\Users\prasun.j\AppData\Local\Continuum\anaconda3\lib\site packages\MySQLdb\cursors.py”,第203行,在execute中 raise编程错误(str(m)) MySQLdb.\u exceptions.ProgrammingError:格式字符串的参数不足 2019-04-01 16:49:41[scrapy.statscollectors]信息:倾销scrapy统计数据: {'downloader/request_bytes':7985, “下载程序/请求计数”:27, “下载程序/请求方法/计数/获取”:27, “downloader/response_字节”:2148049, “下载程序/响应计数”:27, “下载/响应状态\计数/200”:26, “下载程序/响应状态\计数/301”:1, “完成原因”:“完成”, “完成时间”:datetime.datetime(2019,4,1,11,19,41350717), “物料刮取计数”:25, “日志计数/调试”:53, “日志计数/错误”:1, “日志计数/信息”:8, “请求深度最大值”:1, “收到的响应数”:26, “调度程序/出列”:27, “调度程序/出列/内存”:27, “调度程序/排队”:27, “调度程序/排队/内存”:27, “开始时间”:datetime.datetime(2019,4,1,11,19,36,743594)} 2019-04-01 16:49:41[刮屑堆芯发动机]信息:十字轴关闭(完成) 此错误

MySQLdb._exceptions.ProgrammingError: not enough arguments for format string
似乎是因为您传递的行中缺少足够数量的参数

您可以尝试打印该行,以了解出了什么问题

无论如何,如果您想将刮取的数据保存到DB,我建议编写一个简单的项目管道,将数据导出到DB,而不通过CSV

有关项目管道的更多信息,请参阅


您可以在

中找到一个有用的示例,似乎您正在将列表传递给需要用逗号提及的参数

尝试将asterix添加到“行”变量:

cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', row)
致:

cursor.execute('INSERT IGNORE INTO yahoo_data (Title,Article) VALUES(%s, %s)', *row)