Python Scrapy在spider中从mongodb获取数据
我已经创建了一个Python Scrapy在spider中从mongodb获取数据,python,scrapy,Python,Scrapy,我已经创建了一个spider,它可以从列表页面刮取网站产品。 在我的spider中,是否有任何方式可以连接到mongodb。获取存储的url的列表,并刮取这些url的 谢谢。您可以从spider本身的mongodb导入URL from pymongo import MongoClient() import scrapy class Myspider(scrapy.Spider): def __init__(self): self.db = MongoClient()
spider
,它可以从列表页面刮取网站产品
。
在我的spider
中,是否有任何方式可以连接到mongodb
。获取存储的url的列表
,并刮取这些url的
谢谢。您可以从
spider
本身的mongodb
导入URL
from pymongo import MongoClient()
import scrapy
class Myspider(scrapy.Spider):
def __init__(self):
self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection
def parse(self, response):
# other codes
for url in self.urls: # self.urls refers to the url's fetched from db
#do operations with the urls
您可以在
spider
本身中从mongodb
导入url
from pymongo import MongoClient()
import scrapy
class Myspider(scrapy.Spider):
def __init__(self):
self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection
def parse(self, response):
# other codes
for url in self.urls: # self.urls refers to the url's fetched from db
#do operations with the urls
您可以在
spider
本身中从mongodb
导入url
from pymongo import MongoClient()
import scrapy
class Myspider(scrapy.Spider):
def __init__(self):
self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection
def parse(self, response):
# other codes
for url in self.urls: # self.urls refers to the url's fetched from db
#do operations with the urls
您可以在
spider
本身中从mongodb
导入url
from pymongo import MongoClient()
import scrapy
class Myspider(scrapy.Spider):
def __init__(self):
self.db = MongoClient() #you can add db-url and port as parameter to MongoClient(), localhost by default
self.urls = self.db.db_name.collection.find() #use appropriate finding criteria here according to the structure of data resides in that collection
def parse(self, response):
# other codes
for url in self.urls: # self.urls refers to the url's fetched from db
#do operations with the urls
您可以从db导入并在spider中使用该URL。您可以从db导入并在spider中使用该URL。您可以从db导入并在spider中使用该URL。您可以从db导入并在spider中使用该URL。