Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 刮擦过滤产生的项目_Python 3.x_Xpath_Scrapy_Xml Parsing_Css Selectors - Fatal编程技术网

Python 3.x 刮擦过滤产生的项目

Python 3.x 刮擦过滤产生的项目,python-3.x,xpath,scrapy,xml-parsing,css-selectors,Python 3.x,Xpath,Scrapy,Xml Parsing,Css Selectors,我正在尝试刮取一些项目,如下所示: def parse(self, response): item = GameItem() item['game_commentary'] = response.css('tr td:nth-child(2)[style*=vertical-align]::text').extract() item['game_movement'] = response.xpath("//tr/td[1][contains(@style,'vertic

我正在尝试刮取一些项目,如下所示:

def parse(self, response):

    item = GameItem()
    item['game_commentary'] = response.css('tr td:nth-child(2)[style*=vertical-align]::text').extract()
    item['game_movement'] = response.xpath("//tr/td[1][contains(@style,'vertical-align: top')]/text()").extract()

    yield item    

我的问题是我不想
产生当前
response.xpath
response.css
选择器提取的所有项

在将这些命令分配给
item['game\u commentation']
item['game\u movement']
之前,是否有一种方法可以应用
regex
或其他方法来过滤未生成的值?

我将研究如何实现这一点。 您必须按如下方式重写解析:

def parse(self, response):
    loader = GameItemLoader(item=GameItem(), response=response)
    loader.add_css('game_commentary', 'tr td:nth-child(2)[style*=vertical-align]::text')
    loader.add_xpath('game_movement', "//tr/td[1][contains(@style,'vertical-align: top')]/text()")
    item = loader.load_item()
    yield item    
您的items.py将如下所示:

from scrapy.item import Item, Field
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst

class GameItemLoader(Item):
    # default input & output processors
    # will be executed for each item loaded,
    # except if a specific in or output processor is specified
    default_output_processor = TakeFirst()

    # you can specify specific input & output processors per field
    game_commentary_in = '...'
    game_commentary_out = '...'

class GameItem(RetviewsItem):
    game_commentary = Field()
    game_movement = Field()


不能用XPath过滤未删除的值吗?XPath 2.0在需要时支持正则表达式。我不知道。谢谢有趣的解决方案!作为初学者,我不知道scrapy还有很多特性。谢谢你,维姆!