Python 如果href为none，如何跳过？_Python_Scrapy

Python 如果href为none，如何跳过？

python scrapy

Python 如果href为none，如何跳过？,python,scrapy,Python,Scrapy,我解析一个页面有20个href到下一个页面。这样地：但是其中一个没有href 这将导致我的代码失败 i = 1000 j = 0 dataLen = len(response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]')) photoNodes = response.xpath('//div[@class="rank_list table rankstyle1"]/

我解析一个页面有20个href到下一个页面。这样地：

但是其中一个没有

href

这将导致我的代码失败

    i = 1000
    j = 0
    dataLen = len(response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]'))
    photoNodes = response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]')
    for photoNode in photoNodes:
        contentHref = photoNode.xpath('.//a/@href').extract_first()
        yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
        i -= 1
        j += 1  
    # start parse next page
    def parse_page(self, response):       
        global countLen, dataLen
        enName = response.xpath('//*[@class="movie_intro_info_r"]/h3/text()').extract_first()
        cnName = response.xpath('//*[@class="movie_intro_info_r"]/h1/text()'
        ...

我尝试添加

如果没有（photoNode是None）：

或

如果没有photoNode==”

仍然不工作

i = 1000
j = 0
dataLen = len(response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]'))
photoNodes = response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]')
for photoNode in photoNodes:
    if not (photoNode is None):
        contentHref = photoNode.xpath('.//a/@href').extract_first()
        # photoHref = photoNode.xpath('.//a/img/@src').extract_first()
        yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
        i -= 1
        j += 1  
    else:
        pass
twRanking['movie'] = movieArray

如果它可能没有

href

，我不知道如何跳过它

任何帮助都将不胜感激。提前感谢。

似乎您需要检查

contentHref

是否为空，而不是

photoNode

photoNode

无论如何都将包含信息，因此它不会为空。试着这样做：

for photoNode in photoNodes:
    contentHref = photoNode.xpath('.//a/@href').extract_first()
    if contentHref:
        # photoHref = photoNode.xpath('.//a/img/@src').extract_first()
        yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
        i -= 1
        j += 1  
    else:
        pass

似乎需要检查

contentHref

是否为空，而不是

photoNode

photoNode

无论如何都将包含信息，因此它不会为空。试着这样做：

for photoNode in photoNodes:
    contentHref = photoNode.xpath('.//a/@href').extract_first()
    if contentHref:
        # photoHref = photoNode.xpath('.//a/img/@src').extract_first()
        yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
        i -= 1
        j += 1  
    else:
        pass