如何使用Scrapy从javascript中提取jsonObj
我想建立一个jsonObj字典。这是我到目前为止所拥有的。我还没有弄清楚如何提取json来解析它如何使用Scrapy从javascript中提取jsonObj,javascript,json,scrapy,Javascript,Json,Scrapy,我想建立一个jsonObj字典。这是我到目前为止所拥有的。我还没有弄清楚如何提取json来解析它 def parse_store(self, response): jsonobj = response.xpath('//script[@window.appData//text').extract() stores = json.loads(jsonobj.body_as_unicode()) print(stores) for stores in resp
def parse_store(self, response):
jsonobj = response.xpath('//script[@window.appData//text').extract()
stores = json.loads(jsonobj.body_as_unicode())
print(stores)
for stores in response:
stores = {}
stores['stores'] = response['stores']
stores['stores']['id'] = response['stores']['id']
stores['stores']['name'] = response['stores']['name']
stores['stores']['addr1'] = response['stores']['addr1']
stores['stores']['city'] = response['stores']['city']
stores['stores']['state'] = response['stores']['state']
stores['stores']['country'] = response['stores']['country']
stores['stores']['zipCode'] = response['stores']['zipCode']
stores['stores']['phone'] = response['stores']['phone']
stores['stores']['latitude'] = response['stores']['latitude']
stores['stores']['longitude'] = response['stores']['longitude']
stores['stores']['services'] = response['stores']['services']
print(stores)
return stores
一种方法是使用(免责声明:我编写了js2xml) 因此,我们假设您有一个带有
元素和一些JavaScript数据的scrapy选择器:
>>> import scrapy
>>> html = '''<script>
... window.appData = {
... "stores": [
... { "id": "952",
... "name": "BAYTOWN TX",
... "addr1": "4620 garth rd",
... "city": "baytown",
... "state": "TX",
... "country": "US",
... "zipCode": "77521",
... "phone": "281-420-0079",
... "locationType": "Store",
... "locationSubType": "Big Box Store",
... "latitude": "29.77313",
... "longitude": "-94.97634"
... }]
... }
... </script>'''
>>> selector = scrapy.Selector(text=html, type="html")
现在,导入js2xml并调用.parse()
函数。返回一个lxml树,表示JavaScript代码(类似于此):
(即,您需要
节点,在
部分上进行过滤,并获取
部分的子级,该部分是
)
js2xml提供了将
节点转换为Python目录和列表的帮助程序(我们使用[0]
选择xpath()调用的第一个结果):
window.appData={“stores”:[{“id”:“952”,“name”:“BAYTOWN TX”,“addr1”:“4620 garth rd”,“city”:“BAYTOWN”,“state”:“TX”,“country”:“US”,“zipCode”:“77521”,“phone”:“281-420-0079”,“locationType”:“Store”,“locationSubType:“Big Box Store”,“latitude:“29.77313”,“longitude:”-94.97634“,
js2xml
太棒了。使用它这么长时间了,我当然很感谢您的回复。我已经安装了js2xml,我确实认为它会有所帮助。我只是不确定最初如何选择JS(window.appData)为了用js2xml遍历它。是否有一个谓词可以用来加载js?您可以测试内容://script[contains(,,“window.appData”)]/text()
我现在看到了js2xml背后的威力。所有东西都会像您发布的那样返回,谢谢!不过我相信还有一个[contains…是我需要迭代到stores的点,它是'window.appData'中的第9个对象。因此,当我使用js2xml.make_dict时,我想在[8]调用stores,但它返回空。我测试内容的语句是否应该在某个点包含'stores'?如果没有输入数据,很难说。你可能想就此提出另一个问题。
>>> js = selector.xpath('//script/text()').extract_first()
>>> js
u'\nwindow.appData = {\n "stores": [\n { "id": "952",\n "name": "BAYTOWN TX",\n "addr1": "4620 garth rd",\n "city": "baytown",\n "state": "TX",\n "country": "US",\n "zipCode": "77521",\n "phone": "281-420-0079",\n "locationType": "Store",\n "locationSubType": "Big Box Store",\n "latitude": "29.77313",\n "longitude": "-94.97634"\n }]\n}\n'
>>> import js2xml
>>> jstree = js2xml.parse(js)
>>> jstree
<Element program at 0x7fc7f1ba3bd8>
>>> print(js2xml.pretty_print(jstree))
<program>
<assign operator="=">
<left>
<dotaccessor>
<object>
<identifier name="window"/>
</object>
<property>
<identifier name="appData"/>
</property>
</dotaccessor>
</left>
<right>
<object>
<property name="stores">
<array>
<object>
<property name="id">
<string>952</string>
</property>
<property name="name">
<string>BAYTOWN TX</string>
</property>
<property name="addr1">
<string>4620 garth rd</string>
</property>
<property name="city">
<string>baytown</string>
</property>
<property name="state">
<string>TX</string>
</property>
<property name="country">
<string>US</string>
</property>
<property name="zipCode">
<string>77521</string>
</property>
<property name="phone">
<string>281-420-0079</string>
</property>
<property name="locationType">
<string>Store</string>
</property>
<property name="locationSubType">
<string>Big Box Store</string>
</property>
<property name="latitude">
<string>29.77313</string>
</property>
<property name="longitude">
<string>-94.97634</string>
</property>
</object>
</array>
</property>
</object>
</right>
</assign>
</program>
>>> jstree.xpath('''
... //assign[left//identifier[@name="appData"]]
... /right
... /*
... ''')
[<Element object at 0x7fc7f257f5f0>]
>>>
>>> js2xml.make_dict(jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0])
>>> from pprint import pprint
>>> pprint(js2xml.jsonlike.make_dict(jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]))
{'stores': [{'addr1': '4620 garth rd',
'city': 'baytown',
'country': 'US',
'id': '952',
'latitude': '29.77313',
'locationSubType': 'Big Box Store',
'locationType': 'Store',
'longitude': '-94.97634',
'name': 'BAYTOWN TX',
'phone': '281-420-0079',
'state': 'TX',
'zipCode': '77521'}]}
>>>