将python数据结构从一种形式转换为另一种形式

将python数据结构从一种形式转换为另一种形式,python,list,dictionary,Python,List,Dictionary,我有一份字典清单如下: [ { "medication_name": "Victoza", "medication_id": 68, "manufacturer_name": "Novo Nordisk", "practice_id": 1, "disease_id": 16, "practice_state": "MA", "disease_name": "Type II Diabetes", "practice_n

我有一份字典清单如下:

[
{
    "medication_name": "Victoza", 
    "medication_id": 68, 
    "manufacturer_name": "Novo Nordisk", 
    "practice_id": 1, 
    "disease_id": 16, 
    "practice_state": "MA", 
    "disease_name": "Type II Diabetes", 
    "practice_name": "Cambridge Hospital Inc"
}, 
{
    "medication_name": "Opsumit", 
    "medication_id": 39, 
    "manufacturer_name": "Actelion", 
    "practice_id": 1, 
    "disease_id": 12, 
    "practice_state": "MA", 
    "disease_name": "Pulmonary Arterial Hypertension", 
    "practice_name": "Cambridge Hospital Inc"
}, 
{
    "medication_name": "ITCA-650", 
    "medication_id": 29, 
    "manufacturer_name": "Intarcia", 
    "practice_id": 1, 
    "disease_id": 16, 
    "practice_state": "MA", 
    "disease_name": "Type II Diabetes", 
    "practice_name": "Cambridge Hospital Inc"
}, 
{
    "medication_name": "Flolan", 
    "medication_id": 22, 
    "manufacturer_name": "GlaxoSmithKline", 
    "practice_id": 1, 
    "disease_id": 12, 
    "practice_state": "CA", 
    "disease_name": "Pulmonary Arterial Hypertension", 
    "practice_name": "Cambridge Hospital Inc"
}, 
{
    "medication_name": "Adcirca", 
    "medication_id": 4, 
    "manufacturer_name": "United Therapeutics", 
    "practice_id": 1, 
    "disease_id": 12, 
    "practice_state": "CA", 
    "disease_name": "Pulmonary Arterial Hypertension", 
    "practice_name": "Cambridge Hospital Inc"
}, 
.....
.....
.....
]
这是一个相当长的列表,为了可读性已被截断。该列表有大量重复条目。我需要的是找到每个键的唯一值,并以以下数据格式表示:

{
medication : [ {medication_id : 1, medication_name: "Victoza"}, {medication_id :2, medication_name:"ITCA-650"},....]
practice   : [ {practice_id : 1, practice_name: "Cambridge"}, {practice_id : 2, practice_name: "Oxford"},...]
disease    : [ {disease_id: 1, disease_name: "Diabetes"}, {disease_id: 2, disease_name: "Obseity"},...]
manufacturer : [{name: "Cipla"}, {name: "Phizer"},...]
state : [{name:"MA"},{name:"CA"},...]
}

最好的方法是什么

使用pandas,假设
数据
是如您所示的词典列表

import pandas as pd

df = pd.DataFrame.from_records(data)
# In [38]: df
# Out[38]:
#    disease_id                     disease_name    manufacturer_name  medication_id medication_name  practice_id           practice_name practice_state
#    0          16                 Type II Diabetes         Novo Nordisk             68         Victoza            1  Cambridge Hospital Inc             MA
#    1          12  Pulmonary Arterial Hypertension             Actelion             39         Opsumit            1  Cambridge Hospital Inc             MA
#    2          16                 Type II Diabetes             Intarcia             29        ITCA-650            1  Cambridge Hospital Inc             MA
#    3          12  Pulmonary Arterial Hypertension      GlaxoSmithKline             22          Flolan            1  Cambridge Hospital Inc             CA
#    4          12  Pulmonary Arterial Hypertension  United Therapeutics              4         Adcirca            1  Cambridge Hospital Inc             CA

res = {}
res['medication'] = df[['medication_id', 'medication_name']].to_dict(orient='records')

# In [49]: res
# Out[49]:
# {
#     'medication': [
#         {'medication_id': 68, 'medication_name': 'Victoza'},
#         {'medication_id': 39, 'medication_name': 'Opsumit'},
#         {'medication_id': 29, 'medication_name': 'ITCA-650'},
#         {'medication_id': 22, 'medication_name': 'Flolan'},
#         {'medication_id': 4, 'medication_name': 'Adcirca'}]
# }
你有了这个想法,剩下的就用“练习”、“疾病”等同样的方法来做

final = {
    'medication': [],
    'practice': [],
    'disease': [],
    'manufacturer': [],
    'state': [],
}

for d in orig_list:
    medication = dict((k, d[k]) for k in ('medication_id', 'medication_name'))
    practice = dict((k, d[k]) for k in ('practice_id', 'practice_name'))
    disease = dict((k, d[k]) for k in ('disease_id', 'disease_name'))
    manufacturer = dict(name=d['manufacturer_name'])
    state = dict(name=d['practice_state'])

    if medication not in final['medication']: final['medication'].append(medication)
    if practice not in final['practice']: final['practice'].append(practice)
    if disease not in final['disease']: final['disease'].append(disease)
    if manufacturer not in final['manufacturer']: final['manufacturer'].append(manufacturer)
    if state not in final['state']: final['state'].append(state)

如果你不需要经常这样做,我建议你这样做。

为什么投反对票。。?至少添加一个注释,说明投票失败的原因?你没有确切地说你想如何重组它,所以我们必须从你的示例输出中猜测,但是看起来你所需要的只是一个简单的for循环。迭代列表并将所需的键放入新的dict中。您从何处获得这些数据?这是一个潜在的昂贵的重组,您可能已经获得了您首先想要的格式。数据是来自web服务api的json输出..我只是可以访问api端点,并且只能获得此格式的数据..另外..示例输出正是..中需要的输出方式..如果您只想独特的组合(例如,药物id和药物名称),do
df[['medicing\u id','medicing\u name']]。将重复项()放入目录(orient='records')
相反,让我试一试……我会回去的you@zyxue..works像符咒一样。最后一件事。可以按药物名称的字母顺序排列药物列表吗?当然可以,类似于
df[['medicing\u id','medicing\u name']]。排序('medicing\u name')。要记录(orient='records')
@Amistad,现在试试。