Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用Scrapy从javascript中提取jsonObj_Javascript_Json_Scrapy - Fatal编程技术网

如何使用Scrapy从javascript中提取jsonObj

如何使用Scrapy从javascript中提取jsonObj,javascript,json,scrapy,Javascript,Json,Scrapy,我想建立一个jsonObj字典。这是我到目前为止所拥有的。我还没有弄清楚如何提取json来解析它 def parse_store(self, response): jsonobj = response.xpath('//script[@window.appData//text').extract() stores = json.loads(jsonobj.body_as_unicode()) print(stores) for stores in resp

我想建立一个jsonObj字典。这是我到目前为止所拥有的。我还没有弄清楚如何提取json来解析它

    def parse_store(self, response):
    jsonobj = response.xpath('//script[@window.appData//text').extract()
    stores = json.loads(jsonobj.body_as_unicode())
    print(stores)
    for stores in response:
        stores = {}
        stores['stores'] = response['stores']
        stores['stores']['id'] = response['stores']['id']
        stores['stores']['name'] = response['stores']['name']
        stores['stores']['addr1'] = response['stores']['addr1']
        stores['stores']['city'] = response['stores']['city']
        stores['stores']['state'] = response['stores']['state']
        stores['stores']['country'] = response['stores']['country']
        stores['stores']['zipCode'] = response['stores']['zipCode']
        stores['stores']['phone'] = response['stores']['phone']
        stores['stores']['latitude'] = response['stores']['latitude']
        stores['stores']['longitude'] = response['stores']['longitude']
        stores['stores']['services'] = response['stores']['services']
    print(stores)

    return stores

一种方法是使用(免责声明:我编写了js2xml)

因此,我们假设您有一个带有
元素和一些JavaScript数据的scrapy选择器:

>>> import scrapy
>>> html = '''<script>
... window.appData = {
...     "stores": [
...     {   "id": "952",
...         "name": "BAYTOWN TX",
...         "addr1": "4620 garth rd",
...         "city": "baytown",
...         "state": "TX",
...         "country": "US",
...         "zipCode": "77521",
...         "phone": "281-420-0079",
...         "locationType": "Store",
...         "locationSubType": "Big Box Store",
...         "latitude": "29.77313",
...         "longitude": "-94.97634"
...     }]
... }
... </script>'''
>>> selector = scrapy.Selector(text=html, type="html")
现在,导入js2xml并调用
.parse()
函数。返回一个lxml树,表示JavaScript代码(类似于此):

(即,您需要
节点,在
部分上进行过滤,并获取
部分的子级,该部分是

js2xml提供了将
节点转换为Python目录和列表的帮助程序(我们使用
[0]
选择xpath()调用的第一个结果):


window.appData={“stores”:[{“id”:“952”,“name”:“BAYTOWN TX”,“addr1”:“4620 garth rd”,“city”:“BAYTOWN”,“state”:“TX”,“country”:“US”,“zipCode”:“77521”,“phone”:“281-420-0079”,“locationType”:“Store”,“locationSubType:“Big Box Store”,“latitude:“29.77313”,“longitude:”-94.97634“,
js2xml
太棒了。使用它这么长时间了,我当然很感谢您的回复。我已经安装了js2xml,我确实认为它会有所帮助。我只是不确定最初如何选择JS(window.appData)为了用js2xml遍历它。是否有一个谓词可以用来加载js?您可以测试内容:
//script[contains(,,“window.appData”)]/text()
我现在看到了js2xml背后的威力。所有东西都会像您发布的那样返回,谢谢!不过我相信还有一个[contains…是我需要迭代到stores的点,它是'window.appData'中的第9个对象。因此,当我使用js2xml.make_dict时,我想在[8]调用stores,但它返回空。我测试内容的语句是否应该在某个点包含'stores'?如果没有输入数据,很难说。你可能想就此提出另一个问题。
>>> js = selector.xpath('//script/text()').extract_first()
>>> js
u'\nwindow.appData = {\n    "stores": [\n    {   "id": "952",\n        "name": "BAYTOWN TX",\n        "addr1": "4620 garth rd",\n        "city": "baytown",\n        "state": "TX",\n        "country": "US",\n        "zipCode": "77521",\n        "phone": "281-420-0079",\n        "locationType": "Store",\n        "locationSubType": "Big Box Store",\n        "latitude": "29.77313",\n        "longitude": "-94.97634"\n    }]\n}\n'
>>> import js2xml
>>> jstree = js2xml.parse(js)
>>> jstree
<Element program at 0x7fc7f1ba3bd8>
>>> print(js2xml.pretty_print(jstree))
<program>
  <assign operator="=">
    <left>
      <dotaccessor>
        <object>
          <identifier name="window"/>
        </object>
        <property>
          <identifier name="appData"/>
        </property>
      </dotaccessor>
    </left>
    <right>
      <object>
        <property name="stores">
          <array>
            <object>
              <property name="id">
                <string>952</string>
              </property>
              <property name="name">
                <string>BAYTOWN TX</string>
              </property>
              <property name="addr1">
                <string>4620 garth rd</string>
              </property>
              <property name="city">
                <string>baytown</string>
              </property>
              <property name="state">
                <string>TX</string>
              </property>
              <property name="country">
                <string>US</string>
              </property>
              <property name="zipCode">
                <string>77521</string>
              </property>
              <property name="phone">
                <string>281-420-0079</string>
              </property>
              <property name="locationType">
                <string>Store</string>
              </property>
              <property name="locationSubType">
                <string>Big Box Store</string>
              </property>
              <property name="latitude">
                <string>29.77313</string>
              </property>
              <property name="longitude">
                <string>-94.97634</string>
              </property>
            </object>
          </array>
        </property>
      </object>
    </right>
  </assign>
</program>
>>> jstree.xpath('''
...     //assign[left//identifier[@name="appData"]]
...         /right
...             /*
...     ''')
[<Element object at 0x7fc7f257f5f0>]
>>> 
>>> js2xml.make_dict(jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0])
>>> from pprint import pprint
>>> pprint(js2xml.jsonlike.make_dict(jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]))
{'stores': [{'addr1': '4620 garth rd',
             'city': 'baytown',
             'country': 'US',
             'id': '952',
             'latitude': '29.77313',
             'locationSubType': 'Big Box Store',
             'locationType': 'Store',
             'longitude': '-94.97634',
             'name': 'BAYTOWN TX',
             'phone': '281-420-0079',
             'state': 'TX',
             'zipCode': '77521'}]}
>>>