Python 未保存到数据库中的已删除项目_Python_Web Scraping_Scrapy_Scrapyd

Python 未保存到数据库中的已删除项目

python web-scraping scrapy

Python 未保存到数据库中的已删除项目,python,web-scraping,scrapy,scrapyd,Python,Web Scraping,Scrapy,Scrapyd,我的scrapy没有将数据保存到数据库中。请建议。它正在抓取数据，但没有将这些数据添加到数据库中。。请查一下密码，然后说点什么我的spider.py文件 from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from project2spider.items import Project2Item from scrapy.http import Request class Pr

我的scrapy没有将数据保存到数据库中。请建议。它正在抓取数据，但没有将这些数据添加到数据库中。。请查一下密码，然后说点什么

我的spider.py文件

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from project2spider.items import Project2Item
from scrapy.http import Request

class ProjectSpider(BaseSpider):
    name = "project2spider"
    allowed_domains = ["http://directory.thesun.co.uk"]
    current_page_no = 1 
    start_urls = [ 
        "http://directory.thesun.co.uk/find/uk/computer-repair"
    ]   

    def get_next_url(self, fired_url):
        if '/page/' in fired_url:
            url, page_no = fired_url.rsplit('/page/', 1)
        else:
            if self.current_page_no != 1:
                #end of scroll
                return 
        self.current_page_no += 1
        return "http://directory.thesun.co.uk/find/uk/computer-repair/page/%s" % self.current_page_no

    def parse(self, response):
        fired_url = response.url
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//div[@class="abTbl "]')
        for site in sites:
            item = Project2Item()
            item['Catogory'] = site.select('span[@class="icListBusType"]/text()').extract()
            item['Bussiness_name'] = site.select('a/@title').extract()
            item['Description'] = site.select('span[last()]/text()').extract()
            item['Number'] = site.select('span[@class="searchInfoLabel"]/span/@id').extract()
            item['Web_url'] = site.select('span[@class="searchInfoLabel"]/a/@href').extract()
            item['adress_name'] = site.select('span[@class="searchInfoLabel"]/span/text()').extract()
            item['Photo_name'] = site.select('img/@alt').extract()
            item['Photo_path'] = site.select('img/@src').extract()
            yield item
        next_url = self.get_next_url(fired_url)
        if next_url:
            yield Request(next_url, self.parse, dont_filter=True)
`

和我的pipelines.py文件：：

from scrapy import log
from twisted.enterprise import adbapi
import MySQLdb.cursors

# the required Pipeline settings.
class MySQLStorePipeline(object):

    def __init__(self):
        #  db settings
        self.dbpool = adbapi.ConnectionPool('MySQLdb',
                db='project2',
                user='root',
                passwd='',
                host='127.0.0.1',
                port='3306',                            
                cursorclass=MySQLdb.cursors.DictCursor,
                charset='utf8',
                use_unicode=True
            )

def process_item(self, item, spider):
    # run db query in thread pool
    query = self.dbpool.runInteraction(self._conditional_insert, item)
    query.addErrback(self.handle_error)
    return item


def _conditional_insert(self, tx, item):
    #runs the condition 
    insert_id = tx.execute(\
        "insert into crawlerapp_directory (Catogory, Bussiness_name, Description, Number, Web_url) "
        "values (%s, %s, %s, %s, %s)",
        (item['Catogory'][0],
         item['Bussiness_name'][0],
         item['Description'][0],
         item['Number'][0],
         item['Web_url'][0],
         )
        )
    #connection to the foreign key Adress.
    tx.execute(\
        "insert into crawlerapp_adress (directory_id, adress_name) "
        "values (%s, %s)",
        (insert_id,
         item['adress_name'][0]
         )
        )
    #connection to the foreign key Photos.
    tx.execute(\
        "insert into crawlerapp_photos (directory_id, Photo_path, Photo_name) "
        "values (%s, %s, %s)",
        (insert_id,
         item['Photo_path'][0],
         item['Photo_name'][0]
         )
        )
    log.msg("Item stored in db: %s" % item, level=log.DEBUG)
def handle_error(self, e):
    log.err(e)

我无法将数据保存到数据库中

请帮助

您不断发布相同的问题，希望人们查看您的所有代码。找出数据库写入失败的地方，假设问题的原因，测试您的假设，然后将一个特定的问题发布到，以显示您所做的工作以及问题的源代码。但是，先生，我在我的spider中做了更改…但仍然不起作用..我真的需要帮助..因为我明天需要显示此项目..我只剩下7个小时..请帮助..我只需要帮助将数据保存到我的数据库..和没有别的了。。所有其他部分都准备好了，先生。我想解释一下它是如何工作的。如果你在寻求帮助之前先找出问题，你可以在几分钟内得到答案。如果你不断地发布大量代码，只是要求人们找出为什么它不起作用，人们通常会忽略你。它显示您有一行，上面写着，

log.msg（“数据库中存储的项：%s”%Item，level=log.DEBUG）

。有人打过电话吗？我保留了那条线路，以使我意识到被刮取的项目被插入到数据库中。但是cmd既不返回该行，也不将任何数据存储到数据库中…：(