是否有一种方法可以从scrapy中的数据库中获取起始URL的ID？使用一些函数，从URL发出请求_Scrapy

是否有一种方法可以从scrapy中的数据库中获取起始URL的ID？使用一些函数，从URL发出请求

scrapy

是否有一种方法可以从scrapy中的数据库中获取起始URL的ID？使用一些函数，从URL发出请求,scrapy,Scrapy,我正在从数据库中提取起始URL，还需要与URL关联的ID，以便我可以将其传递到项目管道中，并与项目一起存储在表中我使用“make_requests_from_url（第[1]行]）传递起始url的“start_url=[]”，它构成起始url的列表。当对各个项目进行爬网时，我需要将id的行[0]传递给项目下面是我的蜘蛛代码： import scrapy import mysql.connector from ..items import AmzProductinfoItem class

我正在从数据库中提取起始URL，还需要与URL关联的ID，以便我可以将其传递到项目管道中，并与项目一起存储在表中

我使用“make_requests_from_url（第[1]行]）传递起始url的“start_url=[]”，它构成起始url的列表。当对各个项目进行爬网时，我需要将id的行[0]传递给项目

下面是我的蜘蛛代码：

import scrapy
import mysql.connector
from ..items import AmzProductinfoItem


class AmzProductinfoSpiderSpider(scrapy.Spider):
    name = 'amz_ProductInfo_Spider'
    nextPageNumber = 2
    allowed_domains = ['amazon.in']
    start_urls = []
    url_fid = []

    def __init__(self):
        self.connection = mysql.connector.connect(host='localhost', database='datacollecter', user='root', password='', charset="utf8", use_unicode=True)
        self.cursor = self.connection.cursor()

    def start_requests(self):

        sql_get_StartUrl = 'SELECT * FROM database.table'
        self.cursor.execute(sql_get_StartUrl)
        rows = self.cursor.fetchall()
        for row in rows:
            yield self.make_requests_from_url(row[1])

我尝试过比较解析方法中的“response.url”，但随着爬行器移动到下一页，情况会发生变化

我不知道如何才能做到这一点。感谢您提供任何指导。

不清楚您为什么需要

self。请从

发出请求。您可以直接接受您的请求：
def start_requests(self):

    sql_get_StartUrl = 'SELECT * FROM database.table'
    self.cursor.execute(sql_get_StartUrl)
    rows = self.cursor.fetchall()
    for row in rows:
        yield scrapy.Request(url=row[1], meta={'url_id': row[0]}, callback=self.parse)

def parse(self, response):
    url_id = response.meta["url_id"]

谢谢你指出这一点。我不知道我可以直接生成解析方法的请求，并且理解“self.make_requests_from_url”是填充start_url的方法。我会尝试你的建议，这对我必须实施和更新我的评论完全有意义。似乎我还必须严格阅读python和scrapy文档。这和建议的效果非常好。我接受这个答案。谢谢