Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 剪贴式xpath删除<;之后的文本;性格_Python_Xpath_Web Scraping_Scrapy_Parsel - Fatal编程技术网

Python 剪贴式xpath删除<;之后的文本;性格

Python 剪贴式xpath删除<;之后的文本;性格,python,xpath,web-scraping,scrapy,parsel,Python,Xpath,Web Scraping,Scrapy,Parsel,我正在尝试从页面获取产品信息。为了获得描述(显示在页面底部),我使用xpath response.xpath('//*[@itemprop="description"]/table//text()').extract()[3].strip() 这给了我描述: u'Color: White, Size:Free Size, With the body: Braided, Buckle: Automatic Deduction, With the body width: section (' 而

我正在尝试从页面获取产品信息。为了获得描述(显示在页面底部),我使用xpath

response.xpath('//*[@itemprop="description"]/table//text()').extract()[3].strip()
这给了我描述:

u'Color: White, Size:Free Size, With the body: Braided, Buckle: Automatic Deduction, With the body width: section ('
而现场的一个是

Color: White, Size:Free Size, With the body: Braided, Buckle: Automatic Deduction, With the body width: section (<2cm), Belt Length: 93cm
Product Type: Belts, Accessories

颜色:白色,尺寸:自由尺寸,带主体:编织,带扣:自动扣减,带主体宽度:截面(这仍然应该在没有任何破解的情况下进行处理,但您可以通过以下方式实现:

from parsel import Selector
...

s = Selector(text=response.body_as_unicode(), type='xml')
s.xpath('//*[@itemprop="description"]/table//text()').extract()[3].strip()
# gives u'Color: White, Size:Free Size, With the body: Braided, Buckle: Automatic Deduction, With the body width: section (2cm), Belt Length: 93cm'

这里的问题是
parsel
(内部
scrapy
解析器)使用
lxml.etree.HtmlParser(recover=True,encoding='utf8')
来删除这种奇怪的字符以避免问题。

它看起来像是因为
这是一个
parsel
错误而被切断的,我将在存储库中检查它