Python 如何循环浏览下拉列表Scrapy_Python_Scrapy

Python 如何循环浏览下拉列表Scrapy

python scrapy

Python 如何循环浏览下拉列表Scrapy,python,scrapy,Python,Scrapy,我正在抓取下面的网站，我需要通过数量下拉列表循环，直到它到达最后，以确定剩余的库存。我在那里放了一个计数器，以确定它在循环中运行了多少次，以确定剩余的库存量，但它只在循环中运行了一次 # Function to parse needed data def parse(self, response): # For loop to run through html code until all needed data is scraped for data

我正在抓取下面的网站，我需要通过数量下拉列表循环，直到它到达最后，以确定剩余的库存。我在那里放了一个计数器，以确定它在循环中运行了多少次，以确定剩余的库存量，但它只在循环中运行了一次

# Function to parse needed data
    def parse(self, response):

        # For loop to run through html code until all needed data is scraped
        for data in response.css('div.card > div.row'):
            # import items from items.py
            item = DataItem()
            # Scrape Category name
            item["Category"] = data.css("div.col-12.prod-cat a::text").get()
            # Scrape card name
            item["Card_Name"]  = data.css("a.card-text::text").get()
            item["Stock"] = data.css("div.font-weight-bold.font-smaller.text-muted::text").get()
            if item["Stock"] == None:
                item["Stock"] = "In Stock"
            # For loop to run through all the buying information needed, skips first row
            for buying_option in data.css('div.buying-options-table div.row')[1:]:
                # Scrape seller, condition, and price
                item["Seller"] = buying_option.css('div.row.align-center.py-2.m-auto > div.col-3.text-center.p-1 > img::attr(title)').get()
                if item["Seller"] == "PRE ORDER":
                    item["Seller"] = "TrollAndToad Com"
                item["Condition"] = buying_option.css("div.col-3.text-center.p-1::text").get()
                num = 0
                for select in buying_option.css('select.w-100'): # Right here is where I am trying to determine the stock by looping through drop down lsit
                    num = num + 1
                item["Price"] = buying_option.css("div.col-2.text-center.p-1::text").get()
                # Return data
                yield item

我基本上通过选择所有的

来计算数量，提取它们的

值

属性并取它们的最大整数值。像这样：

quantity\u options=p.css（'.product add container.box数量选项：：attr（value）'）.getall（）
数量=最大值（映射（整数、数量选项））

我还对你的代码进行了一些重构

import scrapy
从scrapy.crawler导入crawler进程
TrollandtoadSpider类（scrapy.Spider）：
名称='TrollAndSpider'
起始URL=[
'https://www.trollandtoad.com/magic-the-gathering/magic-2020-m20-/14878'
]
记录器=无
def解析（self，response:scrapy.http.response）：
对于响应.css中的p（'.product col>.card>.row'）：
p:刮毛。选择器
title=p.css（'.prod title a:：text'）.get（）
category=p.css（'.prod cat a:：text'）.get（）
stock=p.css（“div.text-mute:：text”）.get（）或“库存”
数量选项=p.css（'.product add container.box数量选项：：attr（value）'）.getall（）
数量=最大值（映射（整数、数量选项））
buying_opts=p.css（'.buying options table.row:last child[class*=col-]'））
seller=buying_opts[0].css（'img:：attr（title）'）.get（）
如果卖方==‘预订单’：
卖方='TrollAndToad Com'
condition=buying_opts[1].css（'：：text'）.get（）
price=buying_opts[3].css（'：：text'）.get（）
产品={
“标题”：标题，
“类别”：类别，
“股票”：股票，
“卖方”：卖方，
“条件”：条件，
“数量”：数量，
"价格":价格,，
}
产量产品
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
p=爬网进程（）
p、 爬行（巨魔和蟾蜍）
p、 开始（）

输出：

{
    'title': 'Leyline of the Void 107/280',
    'category': 'Magic 2020 (M20) Singles',
    'stock': 'In Stock',
    'seller': 'TrollAndToad Com',
    'condition': 'Near Mint',
    'quantity': 6,
    'price': '$17.49'
},
{
    'title': "Sephara, Sky's Blade 036/280",
    'category': 'Magic 2020 (M20) Singles',
    'stock': 'In Stock',
    'seller': 'TrollAndToad Com',
    'condition': 'Near Mint',
    'quantity': 3,
    'price': '$3.99'
}

Items.py

import scrapy

class MagiccardsiteItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    Category = scrapy.Field()
    Card_Name = scrapy.Field()
    Stock = scrapy.Field()
    Seller = scrapy.Field()
    Condition = scrapy.Field()
    Price = scrapy.Field()
    Num = scrapy.Field()

蜘蛛代码

import scrapy
from MagicCardSite.items import MagiccardsiteItem


class CardinfoSpider(scrapy.Spider):
    name = 'CardInfo'

    url = 'https://www.trollandtoad.com/magic-the-gathering/magic-2020-m20-singles/15088'

    def start_requests(self):
        yield scrapy.Request(url=self.url, callback=self.parse)

    def parse(self, response):
    for row in response.xpath('//div[contains(@class,"product-col")]'):
        num = 0
        item = MagiccardsiteItem()
        item['Category'] = row.xpath('.//div[@class="col-12 prod-cat"]/u/a/text()').get()
        item['Card_Name'] = row.xpath('.//div[@class="col-12 prod-title"]/a/text()').get()
        stock = row.xpath('.//div[@class="box-quantity col-2 p-1"]/select[@class="w-100"]/option[last()]/text()').get()
        item['Stock'] = 'In Stock' if int(stock) > 0 else None
        item['Seller'] = row.xpath('.//div[@class="buying-options-table pb-3"]//img/@src').get().split('logos/')[1].replace('.png', '')
        item['Condition'] = row.xpath('.//div[@class="buying-options-table pb-3"]/div[2]/div[2]/text()').get()
        item['Price'] = row.xpath('.//div[@class="buying-options-table pb-3"]/div[2]/div[4]/text()').get()

        for option in row.xpath('.//div[@class="box-quantity col-2 p-1"]/select[@class="w-100"]/option'):
            num += 1
        item['Num'] = num

        yield item

结果

使用XPath有一种非常简单的方法：

stock_quantity = row.xpath('//select[@name="qtyToBuy"]/option[last()]/@value').get()

工作。但我必须使它成为一个相对xpath，以使其正常工作。作为一个绝对Xpath，每次都返回相同的结果。非常感谢。