Python Scrapy在spider中从mongodb获取数据

Python Scrapy在spider中从mongodb获取数据,python,scrapy,Python,Scrapy,我已经创建了一个spider,它可以从列表页面刮取网站产品。 在我的spider中,是否有任何方式可以连接到mongodb。获取存储的url的列表,并刮取这些url的 谢谢。您可以从spider本身的mongodb导入URL from pymongo import MongoClient() import scrapy class Myspider(scrapy.Spider): def __init__(self): self.db = MongoClient()

我已经创建了一个
spider
,它可以从列表页面刮取网站
产品
。 在我的
spider
中,是否有任何方式可以连接到
mongodb
。获取存储的
url的列表
,并刮取这些url的


谢谢。

您可以从
spider
本身的
mongodb
导入
URL

from pymongo import MongoClient()
import scrapy

class Myspider(scrapy.Spider):

    def __init__(self):
        self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
        self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection

    def parse(self, response):
        # other codes
        for url in self.urls: # self.urls refers to the url's fetched from db
            #do operations with the urls

您可以在
spider
本身中从
mongodb
导入
url

from pymongo import MongoClient()
import scrapy

class Myspider(scrapy.Spider):

    def __init__(self):
        self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
        self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection

    def parse(self, response):
        # other codes
        for url in self.urls: # self.urls refers to the url's fetched from db
            #do operations with the urls

您可以在
spider
本身中从
mongodb
导入
url

from pymongo import MongoClient()
import scrapy

class Myspider(scrapy.Spider):

    def __init__(self):
        self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
        self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection

    def parse(self, response):
        # other codes
        for url in self.urls: # self.urls refers to the url's fetched from db
            #do operations with the urls

您可以在
spider
本身中从
mongodb
导入
url

from pymongo import MongoClient()
import scrapy

class Myspider(scrapy.Spider):

    def __init__(self):
        self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
        self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection

    def parse(self, response):
        # other codes
        for url in self.urls: # self.urls refers to the url's fetched from db
            #do operations with the urls

您可以从db导入并在spider中使用该URL。您可以从db导入并在spider中使用该URL。您可以从db导入并在spider中使用该URL。您可以从db导入并在spider中使用该URL。