我有一个包含一组字典的Python列表,比如JSON表示。我的任务是根据序列号查找用户数。
输入数据如下所示。但是,它实际上在该列表下包含数千个字典,并且序列号在整个列表中重复我有一个包含一组字典的Python列表,比如JSON表示。我的任务是根据序列号查找用户数。,python,json,pandas,dictionary,dataframe,Python,Json,Pandas,Dictionary,Dataframe,输入数据如下所示。但是,它实际上在该列表下包含数千个字典,并且序列号在整个列表中重复 [{ "serial_id": 1, "name": "ABC" }, { "serial_id": 6, "name": "DEF" }, { "serial_id": 8, "name": "GHI" }, { "serial_id
[{
"serial_id": 1,
"name": "ABC"
},
{
"serial_id": 6,
"name": "DEF"
},
{
"serial_id": 8,
"name": "GHI"
},
{
"serial_id": 0,
"name": "JKL"
},
{
"serial_id": 6,
"name": "VVV"
}]
现在,我知道序列号的范围,但我不想硬编码。
我的任务是找到每个序列号的用户总数(即名称计数)。如果我能得到一个类似于表的结构,按降序排序,包含列、序列号和每个序列号的用户计数,那就更好了
问题是:
我们可以利用数据帧概念吗?如果可能的话,我愿意。
我无法获得任何方法来实现所需的输出
提前谢谢
由于JSON数据是从API中提取的,下面是我尝试过但失败严重的代码
#Python libraries
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
from collections import Counter
url1 = 'INPUT URL'
#print ('Retrieving',url1)
#uh = urllib2.urlopen(url1)
r = requests.get(url1)
r = r.text
#print r
#print ('Retrieved', len(r), 'characters')
try:js = json.loads(r) # js -> Native Python list
except:js = None
#print js
info = json.dumps(js , indent =4) #Prints out the JSON data in a nice format which we call as "Pretty Print"
#print (info)
'''
#print ('User Count:' , len(info))
for item in (js):
print ('Name' , item["name"])
'''
'''
user_count = 0
for item in (js):
#df = {'serial_id': Series[item["affiliate_id"]]} //ERROR
df = DataFrame({'serial_id': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
#Hard-coded the serial_id since we know the range of the affiliate_id
print(df)
让我们使用熊猫数据帧:
from io import StringIO
import pandas as pd
jstring = StringIO("""[{
"serial_id": 1,
"name": "ABC"
},
{
"serial_id": 6,
"name": "DEF"
},
{
"serial_id": 8,
"name": "GHI"
},
{
"serial_id": 0,
"name": "JKL"
},
{
"serial_id": 6,
"name": "VVV"
}]""")
df = pd.read_json(jstring)
df_out = df.groupby('serial_id')['name'].count().reset_index(name='name_count')
print(df_out)
输出:
serial_id name_count
0 0 1
1 1 1
2 6 2
3 8 1