Python 3.x Python3 web抓取和解析Json_Python 3.x_Beautifulsoup

Python 3.x Python3 web抓取和解析Json

python-3.x

Python 3.x Python3 web抓取和解析Json,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,python编程新手。正在尝试从以下内容中提取信息：数据层=[{'user.IntExt'：'External'，'user.UserId'：，'app.Page'：'stores.aspx'，'app.siteArea'：'YPO-HM'，'app.Version'：'TBD'，'acct.storeAccount'：'200315'，'acct.storeState'：'AL'，'acct.storeChain'：'TBD'，'acct.chainName'：'TBD'，'acct.N

python编程新手。正在尝试从以下内容中提取信息：

数据层=[{'user.IntExt'：'External'，'user.UserId'：，'app.Page'：'stores.aspx'，'app.siteArea'：'YPO-HM'，'app.Version'：'TBD'，'acct.storeAccount'：'200315'，'acct.storeState'：'AL'，'acct.storeChain'：'TBD'，'acct.chainName'：'TBD'，'acct.NCPDP'：'0140044'，'acct.StoreSegment'：'TBD'，'acct.storeId'，'2068'，'acct.storeName'：'acct'acct'35611'，'acct.storeRegion'：'SOUTH'，'acct.storeGAUAID'：，}]；functionw，d，s，l，i{w[l]=w[l]|【】；w[l]。push{gtm.start'：new Date.getTime，event:'gtm.js'}；var f=d.getElementsByTagNames[0]，j=d.createElements，dl=l！='dataLayer'？'&l='l:'j.async=true；j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl；f.parentNode.insertBeforej，f；}窗口、文档、“脚本”、“数据层”、“GTM-NW87TKH”；当您执行data=soupdata.find'script时，它将返回它找到的第一个脚本标记。您需要执行find_all，然后迭代这些元素以提取您要查找的元素。然后需要操纵字符串，使其采用可以使用json.loads的格式

输出：

酷。如果成功了，一定要接受答案。

import requests
import urllib.request
import urllib
from bs4 import BeautifulSoup
from csv import writer
import csv
import json
import re


url = 'https://stores.healthmart.com/athenspharmacy/stores.aspx'
response = requests.get(url)
soupdata = BeautifulSoup(response.text,'html.parser')

scripts = soupdata.find_all('script')
jsonObj = None

for script in scripts:
    if 'dataLayer ='  in script.text:
        jsonStr = script.text
        jsonStr = jsonStr.split('dataLayer = [')[1]
        jsonStr = jsonStr.split('];')[0]
        jsonStr = jsonStr.replace("'", '"')
        jsonStr = ','.join(jsonStr.split(',')[0:-1]) + '}'

        jsonObj = json.loads(jsonStr)

print (jsonObj['acct.NCPDP'], jsonObj['acct.storeId'])

print (jsonObj['acct.NCPDP'], jsonObj['acct.storeId'])
0140044 2068