Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用Scrapy选择器处理不一致的标记?_Python_Xpath_Web Scraping_Scrapy - Fatal编程技术网

Python 如何使用Scrapy选择器处理不一致的标记?

Python 如何使用Scrapy选择器处理不一致的标记?,python,xpath,web-scraping,scrapy,Python,Xpath,Web Scraping,Scrapy,我想从以下网站获得“按游戏进行”的信息: 棘手的标记代码: <tr> <td>8</td> <td>Def Rebound</td> <td>13 - 13</td> <td>Zalgiris Kaunas</td> <td>VECVAGARS, KASPARS</td>

我想从以下网站获得“按游戏进行”的信息:

棘手的标记代码:

    <tr>
        <td>8</td>
        <td>Def Rebound</td>
        <td>13 - 13</td>
        <td>Zalgiris Kaunas</td>
        <td>VECVAGARS, KASPARS</td>
    </tr>
    <tr class="play">
        <td>8</td>
        <td>Two Pointer</td>
        <td>15 - 13</td>
        <td>Zalgiris Kaunas</td>
        <td>VECVAGARS, KASPARS</td>
    </tr>
我收到的结果是:

{'Event': [u'Def Rebound'],
 'Minute': [u'19'],
 'Player': [u'KIRILENKO, ANDREI'],
 'Res_h': [u'31 - 38'],
 'Res_v': [u'31 - 38'],
 'Team_player': [u'CSKA Moscow']}

{'Event': [],
 'Minute': [],
 'Player': [],
 'Res_h': [],
 'Res_v': [],
 'Team_player': []}
当代码必须处理“tr”的“play”类时,会出现空值

问题:


当我有两个可能的标记选项,可以在特定的随机情况下使用时,我该怎么做呢?

这将得到您想要的一切:

In [53]: l=['Event',                               
 'Minute',
 'Player',
 'Res_h',
 'Res_v', 'Team_player']

In [54]: table = r.xpath("//table[@class='table']")

In [55]: for tr in table.xpath(".//tr[position() > 1]"):
           assert dict(zip(l, tr.xpath("./td//text()").extract()))  != {}

....:     
In [56]: 
它跳过了标题行并提取了所有剩余的元素,l中元素的顺序是错误的,但想法是正确的,所以我将让您来找出您想要的内容和位置,这是
tr.xpath(“./td//text()”
)返回的内容的一个片段:

In [53]: l=['Event',                               
 'Minute',
 'Player',
 'Res_h',
 'Res_v', 'Team_player']

In [54]: table = r.xpath("//table[@class='table']")

In [55]: for tr in table.xpath(".//tr[position() > 1]"):
           assert dict(zip(l, tr.xpath("./td//text()").extract()))  != {}

....:     
In [56]: 
[u'15', u'Shot Rejected', u'29 - 25', u'Zalgiris Kaunas', u'HANLAN, OLIVIER']
[u'15', u'Block', u'29 - 25', u'Real Madrid', u'NOCIONI, ANDRES']
[u'15', u'Off Rebound', u'29 - 25', u'Zalgiris Kaunas', u' ']
[u'15', u'Two Pointer', u'31 - 25', u'Zalgiris Kaunas', u'VENE, SIIM-SANDER']
[u'15', u'Assist', u'31 - 25', u'Zalgiris Kaunas', u'RANDLE, JEROME']
[u'16', u'Minute', u'31 - 25', u' ', u' ']
[u'16', u'Three Pointer', u'31 - 28', u'Real Madrid', u'NOCIONI, ANDRES']
[u'16', u'Assist', u'31 - 28', u'Real Madrid', u'LLULL, SERGIO']
[u'16', u'Two Pointer', u'33 - 28', u'Zalgiris Kaunas', u'RANDLE, JEROME']
[u'16', u'Foul', u'33 - 28', u'Zalgiris Kaunas', u'SAJUS, MARTYNAS']
[u'16', u'Foul Drawn', u'33 - 28', u'Real Madrid', u'LLULL, SERGIO']
[u'16', u'Free Throw In', u'33 - 29', u'Real Madrid', u'LLULL, SERGIO']
[u'16', u'Free Throw In', u'33 - 30', u'Real Madrid', u'LLULL, SERGIO']
[u'16', u'In', u'33 - 30', u'Zalgiris Kaunas', u'JANKUNAS, PAULIUS']