Python 如何将json与Pandas连接
这里的目标是统计每种诊断类型的患者数量。在病历中,就诊id是唯一的,但在诊断记录中,由于一次就诊可能有多个诊断,因此同一就诊id可能有多个诊断id 为此,我认为需要将2数据框与实地访问id相链接。请任何人说明如何通过Pandas链接2 json,并计算每个诊断的患者数量。非常感谢 病历 JSON[病历]Python 如何将json与Pandas连接,python,python-3.x,pandas,Python,Python 3.x,Pandas,这里的目标是统计每种诊断类型的患者数量。在病历中,就诊id是唯一的,但在诊断记录中,由于一次就诊可能有多个诊断,因此同一就诊id可能有多个诊断id 为此,我认为需要将2数据框与实地访问id相链接。请任何人说明如何通过Pandas链接2 json,并计算每个诊断的患者数量。非常感谢 病历 JSON[病历] [ { "Doctor id":"AU1254", "Patient":[ { "
[
{
"Doctor id":"AU1254",
"Patient":[
{
"Patient id":"BK1221",
"Patient name":"Tim"
}
],
"Visit id":"B0001"
},
{
"Doctor id":"AU8766",
"Patient":[
{
"Patient id":"BK1209",
"Patient name":"Sue"
}
],
"Visit id":"B0002"
},
{
"Doctor id":"AU1254",
"Patient":[
{
"Patient id":"BK1323",
"Patient name":"Sary"
}
],
"Visit id":"B0003"
}
]
诊断记录
JSON[诊断记录]
[
{
"Visit id":"B0001",
"Diagnosis":[
{
"diagnosis id":"D1001",
"diagnosis name":"fever"
},
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1265",
"diagnosis name":"running nose"
}
]
},
{
"Visit id":"B0002",
"Diagnosis":[
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1453",
"diagnosis name":"stomach ache"
}
]
}
]
您可以在访问id时使用左合并
和每个诊断/每个患者的计数
df3.groupby(['diagnosis id', 'diagnosis name']).agg({'Patient name': [list, 'count']})
Patient name
list count
diagnosis id diagnosis name
D1001 fever [Tim] 1
D1265 running nose [Tim] 1
D1453 stomach ache [Sue] 1
D1987 cough [Tim, Sue] 2
您可以在访问id时使用左合并
和每个诊断/每个患者的计数
df3.groupby(['diagnosis id', 'diagnosis name']).agg({'Patient name': [list, 'count']})
Patient name
list count
diagnosis id diagnosis name
D1001 fever [Tim] 1
D1265 running nose [Tim] 1
D1453 stomach ache [Sue] 1
D1987 cough [Tim, Sue] 2
尝试:x->JSON[患者记录],y->JSON[诊断记录]
[
{
"Visit id":"B0001",
"Diagnosis":[
{
"diagnosis id":"D1001",
"diagnosis name":"fever"
},
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1265",
"diagnosis name":"running nose"
}
]
},
{
"Visit id":"B0002",
"Diagnosis":[
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1453",
"diagnosis name":"stomach ache"
}
]
}
]
df_合并:
要计算:
编辑:
尝试:
尝试:x->JSON[患者记录],y->JSON[诊断记录]
[
{
"Visit id":"B0001",
"Diagnosis":[
{
"diagnosis id":"D1001",
"diagnosis name":"fever"
},
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1265",
"diagnosis name":"running nose"
}
]
},
{
"Visit id":"B0002",
"Diagnosis":[
{
"diagnosis id":"D1987",
"diagnosis name":"cough"
},
{
"diagnosis id":"D1453",
"diagnosis name":"stomach ache"
}
]
}
]
df_合并:
要计算:
编辑:
尝试:
请尝试以下方法查看患者记录
patients_df = pd.read_json(patients.json)
patient_id = []
patient_name =[]
# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
patient_id = patient[0]['Patient id']
patient_name = patient[0]['Patient name']
# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id
# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')
请尝试以下方法查看患者记录
patients_df = pd.read_json(patients.json)
patient_id = []
patient_name =[]
# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
patient_id = patient[0]['Patient id']
patient_name = patient[0]['Patient name']
# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id
# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')
这可能会有帮助:Hi@Pygirl,对于“咳嗽”的诊断,计数不是2吗?我按id分组,然后按名称分组。我更新了我的答案:嗨,我想计算每个诊断的患者数量,对于诊断性咳嗽,患者数量是2,是Tim和Sue,你能帮忙吗advise@epiphany:现在检查这可能有帮助:嗨@Pygirl,对于“咳嗽”的诊断,不是计数是2吗?我按id分组,然后按名称分组。我更新了我的答案:嗨,我想计算每个诊断的患者数量,对于诊断性咳嗽,患者数量是2,是Tim和Sue,你能帮忙吗advise@epiphany:现在检查hi@Danail,我在扫描字符串文字时获取语法错误:EOL。我想知道如果json1是一个用“”括起来的字符串,只需将json内容粘贴为字符串,就可以了。像这样:json1='copy->paste'upvote+1,谢谢@Danail,我在扫描字符串文字时得到了SyntaxError:EOL。我想知道如果json1是一个用“”括起来的字符串,只需将json内容粘贴为字符串,就可以了。像这样:json1='copy->paste'upvote+1,谢谢
df_merge.groupby('diagnosis name').agg({'Patient name': [list, 'count']}).reset_index()
diagnosis name Patient name
list count
cough [Tim, Sue] 2
fever [Tim] 1
running nose [Tim] 1
stomach ache [Sue] 1
patients_df = pd.read_json(patients.json)
patient_id = []
patient_name =[]
# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
patient_id = patient[0]['Patient id']
patient_name = patient[0]['Patient name']
# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id
# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')