如何使用Scrapy从javascript中提取jsonObj_Javascript_Json_Scrapy

如何使用Scrapy从javascript中提取jsonObj

javascript json scrapy

如何使用Scrapy从javascript中提取jsonObj,javascript,json,scrapy,Javascript,Json,Scrapy,我想建立一个jsonObj字典。这是我到目前为止所拥有的。我还没有弄清楚如何提取json来解析它 def parse_store(self, response): jsonobj = response.xpath('//script[@window.appData//text').extract() stores = json.loads(jsonobj.body_as_unicode()) print(stores) for stores in resp

我想建立一个jsonObj字典。这是我到目前为止所拥有的。我还没有弄清楚如何提取json来解析它

    def parse_store(self, response):
    jsonobj = response.xpath('//script[@window.appData//text').extract()
    stores = json.loads(jsonobj.body_as_unicode())
    print(stores)
    for stores in response:
        stores = {}
        stores['stores'] = response['stores']
        stores['stores']['id'] = response['stores']['id']
        stores['stores']['name'] = response['stores']['name']
        stores['stores']['addr1'] = response['stores']['addr1']
        stores['stores']['city'] = response['stores']['city']
        stores['stores']['state'] = response['stores']['state']
        stores['stores']['country'] = response['stores']['country']
        stores['stores']['zipCode'] = response['stores']['zipCode']
        stores['stores']['phone'] = response['stores']['phone']
        stores['stores']['latitude'] = response['stores']['latitude']
        stores['stores']['longitude'] = response['stores']['longitude']
        stores['stores']['services'] = response['stores']['services']
    print(stores)

    return stores

一种方法是使用（免责声明：我编写了js2xml）

因此，我们假设您有一个带有

元素和一些JavaScript数据的scrapy选择器：

>>> import scrapy
>>> html = '''<script>
... window.appData = {
...     "stores": [
...     {   "id": "952",
...         "name": "BAYTOWN TX",
...         "addr1": "4620 garth rd",
...         "city": "baytown",
...         "state": "TX",
...         "country": "US",
...         "zipCode": "77521",
...         "phone": "281-420-0079",
...         "locationType": "Store",
...         "locationSubType": "Big Box Store",
...         "latitude": "29.77313",
...         "longitude": "-94.97634"
...     }]
... }
... </script>'''
>>> selector = scrapy.Selector(text=html, type="html")

现在，导入js2xml并调用

.parse（）

函数。返回一个lxml树，表示JavaScript代码（类似于此）：

（即，您需要

节点，在

部分上进行过滤，并获取

部分的子级，该部分是

）

js2xml提供了将

节点转换为Python目录和列表的帮助程序（我们使用

[0]

选择xpath（）调用的第一个结果）：

window.appData={“stores”：[{“id”：“952”，“name”：“BAYTOWN TX”，“addr1”：“4620 garth rd”，“city”：“BAYTOWN”，“state”：“TX”，“country”：“US”，“zipCode”：“77521”，“phone”：“281-420-0079”，“locationType”：“Store”，“locationSubType:“Big Box Store”，“latitude:“29.77313”，“longitude:”-94.97634“，

js2xml

太棒了。使用它这么长时间了，我当然很感谢您的回复。我已经安装了js2xml，我确实认为它会有所帮助。我只是不确定最初如何选择JS（window.appData）为了用js2xml遍历它。是否有一个谓词可以用来加载js？您可以测试内容：

//script[contains（，，“window.appData”）]/text（）

我现在看到了js2xml背后的威力。所有东西都会像您发布的那样返回，谢谢！不过我相信还有一个[contains…是我需要迭代到stores的点，它是'window.appData'中的第9个对象。因此，当我使用js2xml.make_dict时，我想在[8]调用stores，但它返回空。我测试内容的语句是否应该在某个点包含'stores'？如果没有输入数据，很难说。你可能想就此提出另一个问题。

>>> js = selector.xpath('//script/text()').extract_first()
>>> js
u'\nwindow.appData = {\n    "stores": [\n    {   "id": "952",\n        "name": "BAYTOWN TX",\n        "addr1": "4620 garth rd",\n        "city": "baytown",\n        "state": "TX",\n        "country": "US",\n        "zipCode": "77521",\n        "phone": "281-420-0079",\n        "locationType": "Store",\n        "locationSubType": "Big Box Store",\n        "latitude": "29.77313",\n        "longitude": "-94.97634"\n    }]\n}\n'

>>> import js2xml
>>> jstree = js2xml.parse(js)
>>> jstree
<Element program at 0x7fc7f1ba3bd8>

>>> print(js2xml.pretty_print(jstree))
<program>
  <assign operator="=">
    <left>
      <dotaccessor>
        <object>
          <identifier name="window"/>
        </object>
        <property>
          <identifier name="appData"/>
        </property>
      </dotaccessor>
    </left>
    <right>
      <object>
        <property name="stores">
          <array>
            <object>
              <property name="id">
                <string>952</string>
              </property>
              <property name="name">
                <string>BAYTOWN TX</string>
              </property>
              <property name="addr1">
                <string>4620 garth rd</string>
              </property>
              <property name="city">
                <string>baytown</string>
              </property>
              <property name="state">
                <string>TX</string>
              </property>
              <property name="country">
                <string>US</string>
              </property>
              <property name="zipCode">
                <string>77521</string>
              </property>
              <property name="phone">
                <string>281-420-0079</string>
              </property>
              <property name="locationType">
                <string>Store</string>
              </property>
              <property name="locationSubType">
                <string>Big Box Store</string>
              </property>
              <property name="latitude">
                <string>29.77313</string>
              </property>
              <property name="longitude">
                <string>-94.97634</string>
              </property>
            </object>
          </array>
        </property>
      </object>
    </right>
  </assign>
</program>

>>> jstree.xpath('''
...     //assign[left//identifier[@name="appData"]]
...         /right
...             /*
...     ''')
[<Element object at 0x7fc7f257f5f0>]
>>>

>>> js2xml.make_dict(jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0])
>>> from pprint import pprint
>>> pprint(js2xml.jsonlike.make_dict(jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]))
{'stores': [{'addr1': '4620 garth rd',
             'city': 'baytown',
             'country': 'US',
             'id': '952',
             'latitude': '29.77313',
             'locationSubType': 'Big Box Store',
             'locationType': 'Store',
             'longitude': '-94.97634',
             'name': 'BAYTOWN TX',
             'phone': '281-420-0079',
             'state': 'TX',
             'zipCode': '77521'}]}
>>>