Python 如何将json与Pandas连接

Python 如何将json与Pandas连接,python,python-3.x,pandas,Python,Python 3.x,Pandas,这里的目标是统计每种诊断类型的患者数量。在病历中,就诊id是唯一的,但在诊断记录中,由于一次就诊可能有多个诊断,因此同一就诊id可能有多个诊断id 为此,我认为需要将2数据框与实地访问id相链接。请任何人说明如何通过Pandas链接2 json,并计算每个诊断的患者数量。非常感谢 病历 JSON[病历] [ { "Doctor id":"AU1254", "Patient":[ { "

这里的目标是统计每种诊断类型的患者数量。在病历中,就诊id是唯一的,但在诊断记录中,由于一次就诊可能有多个诊断,因此同一就诊id可能有多个诊断id

为此,我认为需要将2数据框与实地访问id相链接。请任何人说明如何通过Pandas链接2 json,并计算每个诊断的患者数量。非常感谢

病历

JSON[病历]

[
 {
   "Doctor id":"AU1254",
   "Patient":[
      {
         "Patient id":"BK1221",
         "Patient name":"Tim"
      }
   ],  
   "Visit id":"B0001"       
},
 {
   "Doctor id":"AU8766",
   "Patient":[
      {
         "Patient id":"BK1209",
         "Patient name":"Sue"
      }
   ],  
   "Visit id":"B0002"  
},
 {
   "Doctor id":"AU1254",
   "Patient":[
      {
         "Patient id":"BK1323",
         "Patient name":"Sary"
      }
   ],  
   "Visit id":"B0003"  
  }
]
诊断记录

JSON[诊断记录]

[
   {
      "Visit id":"B0001",
      "Diagnosis":[
         {
            "diagnosis id":"D1001",
            "diagnosis name":"fever"           
         },
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"
         },
         {
             "diagnosis id":"D1265",
            "diagnosis name":"running nose"
         }
      ]
   }, 
      {
      "Visit id":"B0002",
      "Diagnosis":[
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"           
         },
         {
            "diagnosis id":"D1453",
            "diagnosis name":"stomach ache"
         }
      ]
   } 
]
您可以在访问id时使用左合并

和每个诊断/每个患者的计数

df3.groupby(['diagnosis id', 'diagnosis name']).agg({'Patient name': [list, 'count']})
                            Patient name
                                    list count
diagnosis id diagnosis name
D1001        fever                 [Tim]     1
D1265        running nose          [Tim]     1
D1453        stomach ache          [Sue]     1
D1987        cough            [Tim, Sue]     2
您可以在访问id时使用左合并

和每个诊断/每个患者的计数

df3.groupby(['diagnosis id', 'diagnosis name']).agg({'Patient name': [list, 'count']})
                            Patient name
                                    list count
diagnosis id diagnosis name
D1001        fever                 [Tim]     1
D1265        running nose          [Tim]     1
D1453        stomach ache          [Sue]     1
D1987        cough            [Tim, Sue]     2
尝试:x->JSON[患者记录],y->JSON[诊断记录]

[
   {
      "Visit id":"B0001",
      "Diagnosis":[
         {
            "diagnosis id":"D1001",
            "diagnosis name":"fever"           
         },
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"
         },
         {
             "diagnosis id":"D1265",
            "diagnosis name":"running nose"
         }
      ]
   }, 
      {
      "Visit id":"B0002",
      "Diagnosis":[
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"           
         },
         {
            "diagnosis id":"D1453",
            "diagnosis name":"stomach ache"
         }
      ]
   } 
]
df_合并:

要计算:

编辑:

尝试:

尝试:x->JSON[患者记录],y->JSON[诊断记录]

[
   {
      "Visit id":"B0001",
      "Diagnosis":[
         {
            "diagnosis id":"D1001",
            "diagnosis name":"fever"           
         },
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"
         },
         {
             "diagnosis id":"D1265",
            "diagnosis name":"running nose"
         }
      ]
   }, 
      {
      "Visit id":"B0002",
      "Diagnosis":[
         {
            "diagnosis id":"D1987",
            "diagnosis name":"cough"           
         },
         {
            "diagnosis id":"D1453",
            "diagnosis name":"stomach ache"
         }
      ]
   } 
]
df_合并:

要计算:

编辑:

尝试:


请尝试以下方法查看患者记录

patients_df = pd.read_json(patients.json)

patient_id = []
patient_name =[]

# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
    patient_id = patient[0]['Patient id']
    patient_name = patient[0]['Patient name']

# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id

# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')

请尝试以下方法查看患者记录

patients_df = pd.read_json(patients.json)

patient_id = []
patient_name =[]

# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
    patient_id = patient[0]['Patient id']
    patient_name = patient[0]['Patient name']

# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id

# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')

这可能会有帮助:Hi@Pygirl,对于“咳嗽”的诊断,计数不是2吗?我按id分组,然后按名称分组。我更新了我的答案:嗨,我想计算每个诊断的患者数量,对于诊断性咳嗽,患者数量是2,是Tim和Sue,你能帮忙吗advise@epiphany:现在检查这可能有帮助:嗨@Pygirl,对于“咳嗽”的诊断,不是计数是2吗?我按id分组,然后按名称分组。我更新了我的答案:嗨,我想计算每个诊断的患者数量,对于诊断性咳嗽,患者数量是2,是Tim和Sue,你能帮忙吗advise@epiphany:现在检查hi@Danail,我在扫描字符串文字时获取语法错误:EOL。我想知道如果json1是一个用“”括起来的字符串,只需将json内容粘贴为字符串,就可以了。像这样:json1='copy->paste'upvote+1,谢谢@Danail,我在扫描字符串文字时得到了SyntaxError:EOL。我想知道如果json1是一个用“”括起来的字符串,只需将json内容粘贴为字符串,就可以了。像这样:json1='copy->paste'upvote+1,谢谢
df_merge.groupby('diagnosis name').agg({'Patient name': [list, 'count']}).reset_index()
diagnosis name  Patient name
                list        count
        cough   [Tim, Sue]  2
        fever   [Tim]       1
running nose    [Tim]       1
stomach ache    [Sue]       1
patients_df = pd.read_json(patients.json)

patient_id = []
patient_name =[]

# Get attributes from nested nested datatypes in Patient column
for patient in patients_df['Patients']:
    patient_id = patient[0]['Patient id']
    patient_name = patient[0]['Patient name']

# Add to the pandas dataframe
patients_df['Patient name'] = patient_name
patient_df['Patient id'] = patient_id

# Drop the 'Patient' column
patients_df = patients_df.drop(column='Patient')