python如何在json中搜索字符串、计算值和分组依据
我有一个python程序调用一个API,该API接收以下结果:python如何在json中搜索字符串、计算值和分组依据,python,json,string,search,count,Python,Json,String,Search,Count,我有一个python程序调用一个API,该API接收以下结果: { "result": [ { "company" : "BMW", "model" : "5" }, { "company" : "BMW", "model" : "5" }, { "company" : "BMW",
{
"result": [
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "3"
},
{
"company" : "BMW",
"model" : "7"
},
{
"company" : "AUDI",
"model" : "A3"
},
{
"company" : "AUDI",
"model" : "A7"
},
]
}
现在,我的任务是从JSON输出中的列表中识别元素的出现次数并对它们进行分组。预期输出应如下所示:
{
"BMW" :
{
"5series" : 3,
"3series" : 1,
"7series" : 1,
},
"AUDI" :
{
"A3" : 1,
"A7" : 1,
},
"MERCEDES":
{
"EClass" : 0,
"SClass" : 0
}
}
我需要从元素列表中找到“公司”。这将包括JSON响应中有时可能不存在的名称,那么预期的输出应该包括0。“模型”名称(3、5、7、A3等)是固定的,因此我们知道这些名称是json api响应中唯一可能存在或不存在的名称
例如:列表中有3个公司名称,代码如下。-companyname=[“BMW”、“AUDI”、“MERCEDES”]。但是,有时JSON API响应可能没有一个或多个元素。在这种情况下,缺少“MERCEDES”,但最终输出应包括“MERCEDES”,并且值为0
以下是我迄今为止所尝试的:
def modelcount():
companyname= ["BMW","AUDI","MERCEDES"]
url = apiurl
#Send Request
apiresponse = requests.get(url, auth=(user, password), headers=headers, proxies=proxies)
# Decode the JSON response into a dictionary and use the data
data = apiresponse.json()
print(len(data['result']))
3series= 0
5series= 0
7series= 0
A3=0
A7=0
EClass = 0
SClass = 0
modelcountjson = {}
for name in companyname:
for item in data['result']:
models= {}
if item['company'] == name:
if item['model'] == 3:
3series = 3series + 1
elif item['model'] == 5:
5series = 5series + 1
elif item['model'] == 7:
7series = 7series + 1
models['3series'] = 3series
models['5series'] = 5series
models['7series'] = 7series
#I still haven't written AUDI, MERCEDES above. This is where i feel i am writing inefficiently.
modelcountjson[name] = models
return jsonify(modelcountjson)
```
随着模型数量的增长,我担心代码会因许多for循环而变得冗余,并可能导致性能开销。我正在寻求以最有效的方式实现最终结果的帮助
非常感谢您的帮助。您可以将代码和配置分离一点:
conf = {
'BMW': {'format': '{}series', 'keys': ['3', '5', '7']},
'AUDI': {'format': '{}', 'keys': ['A3', 'A7']},
'MERCEDES': {'format': '{}Class', 'keys': ['E', 'S']},
}
def modelcount():
# retrieve `data`
# ...
result = {
k: {
v['format'].format(key): 0 for key in v['keys']
} for k, v in conf.items()
}
for car in data['result']:
com = car['company']
mod = car['model']
key = conf[com]['format'].format(mod)
result[com][key] += 1
for com in result:
result[com]['Total'] = sum(result[com].values())
return result
>>> modelcount()
{'BMW': {'3series': 1, '5series': 3, '7series': 1},
'AUDI': {'A3': 1, 'A7': 1},
'MERCEDES': {'EClass': 0, 'SClass': 0}}
这样,对于更多的公司和型号,您只需触摸
conf
,而不必触摸代码。其时间复杂度为O(m+n)
带有m
不同型号的总数和n
API响应中的汽车数量。直接使用JSON样式字典和列表的有用包是toolz
(有关更多详细信息,请参阅)。通过这种方式,您可以在单独处理可能丢失的数据时,对数据进行简单分组并统计每个模型的出现次数:
from toolz import itertoolz
result = {
"result": [
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "5"
},
{
"company" : "BMW",
"model" : "3"
},
{
"company" : "BMW",
"model" : "7"
},
{
"company" : "AUDI",
"model" : "A3"
},
{
"company" : "AUDI",
"model" : "A7"
},
]
}
final_output = {}
grouped_result = itertoolz.groupby('company', result['result'])
if 'MERCEDES' not in grouped_result:
final_output['MERCEDES'] = {
'EClass': 0,
'SClass': 0
}
for key, value in grouped_result.items():
models = itertoolz.pluck('model', value)
final_output[key] = itertoolz.frequencies(models)
输出结果如下:
{'AUDI': {'A3': 1, 'A7': 1}, 'BMW': {'3': 1, '5': 3, '7': 1}, 'MERCEDES': {'EClass': 0, 'SClass': 0}}
是否会有更多的公司,或者只有这3家?从API来看,梅赛德斯的车型会是什么样子?他们是否需要像宝马一样的处理?嗨,目前只有3家公司,而且可能会根据需要增长。对于梅赛德斯车型,它将是“ESeries”和“SSeries”“它将采用与宝马和奥迪相同的格式。非常感谢施沃巴塞格的快速回复。让我再告诉你一次,效果很好。只是想知道,如何将这些数字相加到“总计”键?例如:{'BMW':{'3series':1,'5series':3,'7series':1,'Total':5},{'A3':1,'A7':1,'Total':2},{'MERCEDES':{'EClass':0,'SClass':0,'Total':0}你好,Madjazz,这也很棒。非常感谢你的帮助。。我尝试了Schwobasegggl的答案,因为它不依赖于额外的库。但你的解决办法是:)