如何使用xpath检索javascript变量值=
我试图从这个JS代码中提取定价和其他属性:如何使用xpath检索javascript变量值=,xpath,scrapy,Xpath,Scrapy,我试图从这个JS代码中提取定价和其他属性: <script type="application/ld+json"> { "@context": "http://schema.org/", "@type": "Product", "name": "Rolex Cellini Time 50505", "image": [ "https://chronexttime.imgix.net/S/1/S1006/S1006_58774a90efd04.jpg
<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "Product",
"name": "Rolex Cellini Time 50505",
"image": [
"https://chronexttime.imgix.net/S/1/S1006/S1006_58774a90efd04.jpg?w=1024&auto=format&fm=jpg&q=75&usm=30&usmrad=1&h=1024&fit=clamp" ],
"description": "Werk: automatic; Herrenuhr; Gehäusegröße: 39; Gehäuse: rose-gold; Armband: leather; Glas: sapphire; Jahr: 2018; Lieferumfang: Originale Box, Originale Papiere, Herstellergarantie",
"mpn": "S1006",
"brand":{
"@type": "Thing",
"name": "Rolex"
},
"offers":{
"@type": "Offer",
"priceCurrency": "EUR",
"price": "11500",
"itemCondition": "http://schema.org/NewCondition",
"availability": "http://schema.org/InStock",
"seller":{
"@type": "Organization",
"name": "CHRONEXT Service Germany GmbH"
}
}
}
</script>
但是响应包含一系列值,而我只寻找11500的价格。稍后,我还会尝试获取例如名称和条件。您有两个选项
1) 使用Json,但它只适用于第一种情况
json_data = json.loads(response.xpath('//script[@type="application/ld+json"]/text()').extract_first())
price = json_data['price']
2) 使用正则表达式:
response.xpath('//script/text()').re_first('price(?:local)?["\']\s*:\s*["\'](.*)'["\'])
price(?:local)?[“\']\s*:\s*[“\']”(.*)[“\']”]
正则表达式表示:
- 以带有可选
后缀的price开头local
- 然后是单引号或双引号
- 然后在零个或多个空格之间执行
:
- 然后是单引号或双引号
- 那么任何值(价格将在此处)
- 然后再加上单引号或双引号
json
解码更好的选择了
对于第二种,当然您可以始终使用正则表达式,但我建议使用一种更干净、更好的解决方案,将javascript转换为xpath可查询格式:
$pip安装js2xml
假设一个脚本具有以下结构:
<script type="text/javascript">
window.articleInfo = {
'id': 'S1006',
'model': 'Cellini Time',
'brand': 'Rolex',
'reference': '50505',
'priceLocal': '11500',
'currencyCode': 'EUR'
};
</script>
您可以通过以下方式查看已解析的的结构:
>> print(js2xml.pretty_print(parsed))
>> <program>
<assign operator="=">
<left>
<dotaccessor>
<object>
<identifier name="window"/>
</object>
<property>
<identifier name="articleInfo"/>
</property>
</dotaccessor>
</left>
<right>
<object>
<property name="id">
<string>S1006</string>
</property>
<property name="model">
<string>Cellini Time</string>
</property>
<property name="brand">
<string>Rolex</string>
</property>
<property name="reference">
<string>50505</string>
</property>
<property name="priceLocal">
<string>11500</string>
</property>
<property name="currencyCode">
<string>EUR</string>
</property>
</object>
</right>
</assign>
</program>
我希望我能在这方面帮助您。尝试“”“//script/substring-before(substring-before)(,“price:”,“,”)|//script/substring-before(substring-before)(substring-before)(,“price:”,“,”,“,”)“
获取无效的Syntax。可能我把代码放错了:response.xpath(“///script/substring-before-before(substring-before-before)(“price:”,“,”).extract_first()尝试response.xpath(''//script/substring before(substring before)(substring before后(,'price:'),',')))。extract_first()
Nop,得到:“ValueError:xpath error:Invalid expression in//script/substring before(substring before后(,'price:'),'),')”@merlin如果答案有助于您解决问题,请不要忘记接受答案。
<script type="text/javascript">
window.articleInfo = {
'id': 'S1006',
'model': 'Cellini Time',
'brand': 'Rolex',
'reference': '50505',
'priceLocal': '11500',
'currencyCode': 'EUR'
};
</script>
import js2xml
...
parsed = js2xml.parse(response.xpath('//script/text()').extract_first())
>> print(js2xml.pretty_print(parsed))
>> <program>
<assign operator="=">
<left>
<dotaccessor>
<object>
<identifier name="window"/>
</object>
<property>
<identifier name="articleInfo"/>
</property>
</dotaccessor>
</left>
<right>
<object>
<property name="id">
<string>S1006</string>
</property>
<property name="model">
<string>Cellini Time</string>
</property>
<property name="brand">
<string>Rolex</string>
</property>
<property name="reference">
<string>50505</string>
</property>
<property name="priceLocal">
<string>11500</string>
</property>
<property name="currencyCode">
<string>EUR</string>
</property>
</object>
</right>
</assign>
</program>
parsed.xpath('//property[@name="id"]/string/text()')[0]
parsed.xpath('//property[@name="model"]/string/text()')[0]
parsed.xpath('//property[@name="brand"]/string/text()')[0]
...