Python 如何在for循环中将字典附加到字典？_Python_Dictionary_For Loop

Python 如何在for循环中将字典附加到字典？

python dictionary for-loop

Python 如何在for循环中将字典附加到字典？,python,dictionary,for-loop,Python,Dictionary,For Loop,我正在尝试创建一个字典，其中每个键的值是两个字典我有两个患者（正常组织、疾病组织）条形码列表，它们对应于数据框中的值列。我的目标是匹配两个列表中的患者，然后，对于两个列表中找到的每个患者，将其正常和疾病组织值附加到字典中。字典键将是患者条形码，字典值将是另一个正常组织的字典：从数据框中提取的值和疾病组织：从数据框中提取的值那么从 In [3]: df = pd.DataFrame({'Patient1_Normal':['nan', 0.01, 0.1, 0.16, 0.88, 0.83,

我正在尝试创建一个字典，其中每个键的值是两个字典

我有两个患者（正常组织、疾病组织）条形码列表，它们对应于数据框中的值列。我的目标是匹配两个列表中的患者，然后，对于两个列表中找到的每个患者，将其正常和疾病组织值附加到字典中。字典键将是患者条形码，字典值将是另一个正常组织的字典：从数据框中提取的值和疾病组织：从数据框中提取的值

那么从

In [3]: df = pd.DataFrame({'Patient1_Normal':['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan'],
                 'Patient1_Disease':[0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
                 'Patient2_Disease':['nan', 'nan', 'nan', 1.0, 0.24, 0.67, 0.97, 0.98],
                 'Patient3_Normal': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9],
                 'Patient3_Disease':[0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
                 'Patient4_Normal':['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91],
                 'Patient4_Disease':['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
                 'Patient5_Disease': [0.34, 0.27, 'nan', 0.16, 0.32, 0.27, 0.55, 0.51]})


In [4]: df                                                                                                                                 
Out[4]: Patient1_Normal Patient1_Disease Patient2_Disease  Patient3_Normal Patient3_Disease Patient4_Normal Patient4_Disease Patient5_Disease
    0             nan             0.12              nan             0.21             0.11             nan              nan             0.34
    1            0.01             0.06              nan             0.25             0.45            0.35              nan             0.27
    2             0.1             0.19              nan             0.63              nan             nan             0.56              nan
    3            0.16             0.34                1             0.92             0.45            0.22             0.72             0.16
    4            0.88              nan             0.24             0.30             0.22            0.45              nan             0.32
    5            0.83              nan             0.67             0.56             0.89            0.66             0.97             0.27
    6            0.82             0.73             0.97             0.78             0.17            0.21             0.91             0.55
    7             nan             0.91             0.98             0.90             0.12            0.91             0.79             0.51

以下是我到目前为止的情况：

D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]

paired_patients = {}
psi_sets = {}
psi_sets['d'] = []
psi_sets['n'] = []

for patient in N_col:
       patient_id = patient[0:8]

       n_id = patient
       d_id = [i for i in D_col if patient_id in i]

       if len(d_id) > 0:
           psi_sets['n'] = df[n_id].to_list()
           for d in d_id:
               psi_sets['d'] = df[d].to_list()

       paired_patients[patient_id] = psi_sets

但是，我的

paired_患者

字典值被覆盖而不是追加，因此

paired_患者

的输出如下所示：

{'Patient1': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
 'Patient3': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
 'Patient4': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

如何修复代码的最后一位，以便为每个患者正确附加

paired_patient

字典值，使

paired_patient

字典如下所示：

{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
  'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
 'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
  'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
 'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

您可以使用

df.melt

，

pd.concat

，

series.str.split

，

df.replace

，

df.groupby

和

df.xs

，最后使用

df.to dict

。请查看以下内容：

>>> df2 = (pd.concat([
                      df.melt().variable.str.split('_', expand=True),
                      df.melt().drop('variable',1)
                    ], axis=1)
                       .replace({'Normal':'n', 'Disease':'d'})
                       .groupby([0,1]).agg(list))
>>> paired_patients = {k: v for k, v in
                       df2.groupby(level=0)
                          .apply(lambda df: df.xs(df.name).value.to_dict())
                          .to_dict().items()
                       if not ({'d', 'n'} ^ v.keys())}
>>> paired_patients
{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
  'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
 'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
  'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
 'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
  'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}

解释：

>>> df.melt()
            variable  value
0    Patient1_Normal    NaN
1    Patient1_Normal   0.01
2    Patient1_Normal   0.10
..               ...    ...
62  Patient5_Disease   0.55
63  Patient5_Disease   0.51

>>> df.melt().variable.str.split('_', expand=True)
 
           0        1
0   Patient1   Normal
1   Patient1   Normal
2   Patient1   Normal
..       ...      ...
62  Patient5  Disease
63  Patient5  Disease

[64 rows x 2 columns]

# then concat these two, replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
                      df.melt().variable.str.split('_', expand=True),
                      df.melt().drop('variable',1)
                    ], axis=1).replace({'Normal':'n', 'Disease':'d'})
           0  1  value
0   Patient1  n    NaN
1   Patient1  n   0.01
2   Patient1  n   0.10
..       ... ..    ...
62  Patient5  d   0.55
63  Patient5  d   0.51

[64 rows x 3 columns]

# then groupby column [0, 1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
                                                      value
0        1                                                 
Patient1 d   [0.12, 0.06, 0.19, 0.34, nan, nan, 0.73, 0.91]
         n    [nan, 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, nan]
Patient2 d     [nan, nan, nan, 1.0, 0.24, 0.67, 0.97, 0.98]
Patient3 d  [0.11, 0.45, nan, 0.45, 0.22, 0.89, 0.17, 0.12]
         n   [0.21, 0.25, 0.63, 0.92, 0.3, 0.56, 0.78, 0.9]
Patient4 d    [nan, nan, 0.56, 0.72, nan, 0.97, 0.91, 0.79]
         n   [nan, 0.35, nan, 0.22, 0.45, 0.66, 0.21, 0.91]
Patient5 d  [0.34, 0.27, nan, 0.16, 0.32, 0.27, 0.55, 0.51]

# Now groupby level=0, and convert that into dict, and finally check whether 
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects

>>> paired_patients = {k: v for k, v in
                       df2.groupby(level=0)
                          .apply(lambda df: df.xs(df.name).value.to_dict())
                          .to_dict().items()
                       if ('n' in v) and ('d' in v)}

配对患者是一本字典。关键是patient_id，它还指向保存列表的词典。字典需要唯一的键。所以配对患者[patient\u id]['d']是一个可以附加到的列表。您每次都会覆盖“d”键，因为您正在将paired_patients[patient_id]=设置为一个字典，该字典的键与其中已有的键相同。预期的输出是什么？@Sushanth我刚刚编辑了原始帖子以包含我的预期输出！

>>> df.melt()
            variable  value
0    Patient1_Normal    NaN
1    Patient1_Normal   0.01
2    Patient1_Normal   0.10
..               ...    ...
62  Patient5_Disease   0.55
63  Patient5_Disease   0.51

>>> df.melt().variable.str.split('_', expand=True)
 
           0        1
0   Patient1   Normal
1   Patient1   Normal
2   Patient1   Normal
..       ...      ...
62  Patient5  Disease
63  Patient5  Disease

[64 rows x 2 columns]

# then concat these two, replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
                      df.melt().variable.str.split('_', expand=True),
                      df.melt().drop('variable',1)
                    ], axis=1).replace({'Normal':'n', 'Disease':'d'})
           0  1  value
0   Patient1  n    NaN
1   Patient1  n   0.01
2   Patient1  n   0.10
..       ... ..    ...
62  Patient5  d   0.55
63  Patient5  d   0.51

[64 rows x 3 columns]

# then groupby column [0, 1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
                                                      value
0        1                                                 
Patient1 d   [0.12, 0.06, 0.19, 0.34, nan, nan, 0.73, 0.91]
         n    [nan, 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, nan]
Patient2 d     [nan, nan, nan, 1.0, 0.24, 0.67, 0.97, 0.98]
Patient3 d  [0.11, 0.45, nan, 0.45, 0.22, 0.89, 0.17, 0.12]
         n   [0.21, 0.25, 0.63, 0.92, 0.3, 0.56, 0.78, 0.9]
Patient4 d    [nan, nan, 0.56, 0.72, nan, 0.97, 0.91, 0.79]
         n   [nan, 0.35, nan, 0.22, 0.45, 0.66, 0.21, 0.91]
Patient5 d  [0.34, 0.27, nan, 0.16, 0.32, 0.27, 0.55, 0.51]

# Now groupby level=0, and convert that into dict, and finally check whether 
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects

>>> paired_patients = {k: v for k, v in
                       df2.groupby(level=0)
                          .apply(lambda df: df.xs(df.name).value.to_dict())
                          .to_dict().items()
                       if ('n' in v) and ('d' in v)}