Web scraping 在<;br>;标签

Web scraping 在<;br>;标签,web-scraping,scrapy,scrapy-shell,Web Scraping,Scrapy,Scrapy Shell,HTML: <span class="number"> - Sep 15, 1991<br><strong>Some Number: </strong>123, 123, 145</span> samples = response.css('ul li.somthing') for sample in samples: loader = ItemLoader(item=CatelogIte

HTML:

<span class="number"> - Sep 15, 1991<br><strong>Some Number: </strong>123, 123, 145</span>
 samples = response.css('ul li.somthing')
    for sample in samples:
        loader = ItemLoader(item=CatelogItem(), selector=sample)
        loader.add_css('some', 'span.number::text')
        yield loader.load_item()
Item.py

some = Field(
    input_processor=MapCompose(str.strip),
    output_processor=Join()
)
结果

- Sep 15, 1991
预期的

- Sep 15, 1991 Some Number: 123, 123, 145

为什么会有这种行为?如何在itemloader中加载完整值?

您需要获取所有innerhtml,而不是包含所有嵌套组件的文本

loader.add_css('some', 'span.number *::text')

你的意思是,loader.add_css('some','span.number::innerHtml')结果是:伪元素::innerHtml是未知的。。谢谢。这很有魅力。修正:loader.add_css('some','span.number*::text')我只想把它记下来,然后向上投票并勾选答案