Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/svg/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 这个精灵可以刮吗?_Python_Svg_Web Scraping_Scrapy_Sprite - Fatal编程技术网

Python 这个精灵可以刮吗?

Python 这个精灵可以刮吗?,python,svg,web-scraping,scrapy,sprite,Python,Svg,Web Scraping,Scrapy,Sprite,我可以用标准的刮痧来刮这个吗?或者我需要用硒吗 html是: <td class="example"><sprite-svg name="EXAMPLE2"><svg><use xlink:href="/spritemap/1_0_30#sprite-EXAMPLE2"></use></svg></sprite-svg></td> 你知

我可以用标准的刮痧来刮这个吗?或者我需要用硒吗

html是:

<td class="example"><sprite-svg name="EXAMPLE2"><svg><use 
xlink:href="/spritemap/1_0_30#sprite-EXAMPLE2"></use></svg></sprite-svg></td> 

你知道怎么刮吗?

是的。你可以用scrapy来做:

 response.xpath("//td[@class='table__cell--tight race-runners__box']/sprite-svg/@name").getall()
工作刮痕代码:

import scrapy

class Test(scrapy.Spider):
    name = 'Test'
    start_urls = [
        'https://www.thedogs.com.au/racing/gawler/2020-07-07/1/the-bunyip-maiden-stake-pr2-division1']

def parse(self, response):
    return {"nameList": response.xpath("//td[@class='table__cell--tight race-runners__box']/sprite-svg/@name").getall()}

看着桌子,每个svg精灵都在一个“rug_X”类下面

差不多

import scrapy


class RaceSpider(scrapy.Spider):
    name = 'race'
    allowed_domains = ['thedogs.com.au']
    start_urls = ['https://www.thedogs.com.au/racing/gawler/2020-07-07/1/the-bunyip-maiden-stake-pr2-division1']
    item = {}
    def parse(self, response):
        row = response.xpath('//tbody/tr')

        dog = a.xpath('.//td[@class="table__cell--tight race-runners__name"]/div/a/text()').get()
        
        number = a.xpath('.//td[@class="table__cell--tight race-runners__box"]/sprite-svg/@name').get() 
      
        cleaned_num = int(number.replace('rug_',''))
        grade = a.xpath('.//td[@class="race-runners__grade"]/text()').get()

        item = {'grade':grade, 'greyhound':dog,'rug':cleaned_num}
        yield item
         

您还可以使用带有自定义函数的项加载器来清理得到的响应。

xpath开头有一个点。您还可以共享站点页面url吗?您在控制台$x(“//td[@class='example'])中链接的页面返回空数组。您能检查一下吗?如何确定哪一个是灰狗?谢谢您的回复,我已经用代码更新了我的帖子。当我将您的代码添加到代码中时,您的代码会给我一个列表,我只希望每只灰狗有一个值。我已经编辑了代码,为您提供了图像中带有数字的每只灰狗。这更清楚吗?太完美了!谢谢。
import scrapy


class RaceSpider(scrapy.Spider):
    name = 'race'
    allowed_domains = ['thedogs.com.au']
    start_urls = ['https://www.thedogs.com.au/racing/gawler/2020-07-07/1/the-bunyip-maiden-stake-pr2-division1']
    item = {}
    def parse(self, response):
        row = response.xpath('//tbody/tr')

        dog = a.xpath('.//td[@class="table__cell--tight race-runners__name"]/div/a/text()').get()
        
        number = a.xpath('.//td[@class="table__cell--tight race-runners__box"]/sprite-svg/@name').get() 
      
        cleaned_num = int(number.replace('rug_',''))
        grade = a.xpath('.//td[@class="race-runners__grade"]/text()').get()

        item = {'grade':grade, 'greyhound':dog,'rug':cleaned_num}
        yield item