Python 不使用Xpath从脚本标记检索数据_Python_Xpath_Scrapy

Python 不使用Xpath从脚本标记检索数据

python xpath scrapy

Python 不使用Xpath从脚本标记检索数据,python,xpath,scrapy,Python,Xpath,Scrapy,我正在尝试从下面附加的脚本标记中检索数据。从该脚本标记中，我需要以下数据：digitalData.product.pvi_type_name，digitalData.product.pvi_subtype_name ，digitalData.product.model\u name，digitalData.product.displayName。我已经用Python编写了自己的检索程序，但目前还不起作用脚本标记结构： <script> var COUNTRY_SHOP_STATU

我正在尝试从下面附加的脚本标记中检索数据。从该脚本标记中，我需要以下数据：

digitalData.product.pvi_type_name

，

digitalData.product.pvi_subtype_name

，

digitalData.product.model\u name

，

digitalData.product.displayName

。我已经用Python编写了自己的检索程序，但目前还不起作用

脚本标记结构：

<script>
var COUNTRY_SHOP_STATUS = "buy";
var COUNTRY_SHOP_URL = "./buy";
var COUNTRY_WHERE_URL = "";
try {digitalData.page.pathIndicator.depth_2 = "mobile";} catch(e) {}
try {digitalData.page.pathIndicator.depth_3 = "mobile";} catch(e) {}
try {digitalData.page.pathIndicator.depth_4 = "smartphones";} catch(e) {}
try {digitalData.page.pathIndicator.depth_5 = "galaxy-note9";} catch(e) {}
try {digitalData.product.pvi_type_name      = "Mobile";} catch(e) {}
try {digitalData.product.pvi_subtype_name   = "Smartphone";} catch(e) {}
try {digitalData.product.model_name         = "SM-N960";} catch(e) {}
try {digitalData.product.displayName        = "galaxy note9";} catch(e) {}
try {digitalData.product.category           = digitalData.page.pathIndicator.depth_3;} catch(e) {}
</script>

如果您获得了

脚本

内容，请尝试以下方法获取所需的值：

import re

result = re.findall('product.*"(.*)"', source_arr[0])
print(result)
# ['Mobile', 'Smartphone', 'SM-N960', 'galaxy note9']

到…但现在还不行。。。你是说。。。？分享当前和期望的输出谢谢你的回答@Andersson！它根本不返回我需要的信息。“类别、类型、型号、SK”中没有填写任何数据。您是否检查（打印出）

source\u arr

？页面源是否在

script

节点中包含目标内容，或者它是动态内容？@Andersson print（source\u arr）返回空。我试图抓取的url是：“”。它包含该脚本标记。我可以使用相同的XPath通过

请求+lxml.html
获得所需的脚本，因此它肯定不是XPath问题另一个快速问题@Andersson。如果我只想选择一个，它将是这样的：“categie=re.findall（'product.pvi_type_name）（.*）”“，source_arr[0]）”？categorie=re.findall（'product.pvi_type_name.*”（.*）”，source_arr[0]）
import re

result = re.findall('product.*"(.*)"', source_arr[0])
print(result)
# ['Mobile', 'Smartphone', 'SM-N960', 'galaxy note9']