Python 如何在for循环中将字典附加到字典?
我正在尝试创建一个字典,其中每个键的值是两个字典 我有两个患者(正常组织、疾病组织)条形码列表,它们对应于数据框中的值列。我的目标是匹配两个列表中的患者,然后,对于两个列表中找到的每个患者,将其正常和疾病组织值附加到字典中。字典键将是患者条形码,字典值将是另一个正常组织的字典:从数据框中提取的值和疾病组织:从数据框中提取的值 那么从Python 如何在for循环中将字典附加到字典?,python,dictionary,for-loop,Python,Dictionary,For Loop,我正在尝试创建一个字典,其中每个键的值是两个字典 我有两个患者(正常组织、疾病组织)条形码列表,它们对应于数据框中的值列。我的目标是匹配两个列表中的患者,然后,对于两个列表中找到的每个患者,将其正常和疾病组织值附加到字典中。字典键将是患者条形码,字典值将是另一个正常组织的字典:从数据框中提取的值和疾病组织:从数据框中提取的值 那么从 In [3]: df = pd.DataFrame({'Patient1_Normal':['nan', 0.01, 0.1, 0.16, 0.88, 0.83,
In [3]: df = pd.DataFrame({'Patient1_Normal':['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan'],
'Patient1_Disease':[0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
'Patient2_Disease':['nan', 'nan', 'nan', 1.0, 0.24, 0.67, 0.97, 0.98],
'Patient3_Normal': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9],
'Patient3_Disease':[0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
'Patient4_Normal':['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91],
'Patient4_Disease':['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'Patient5_Disease': [0.34, 0.27, 'nan', 0.16, 0.32, 0.27, 0.55, 0.51]})
In [4]: df
Out[4]: Patient1_Normal Patient1_Disease Patient2_Disease Patient3_Normal Patient3_Disease Patient4_Normal Patient4_Disease Patient5_Disease
0 nan 0.12 nan 0.21 0.11 nan nan 0.34
1 0.01 0.06 nan 0.25 0.45 0.35 nan 0.27
2 0.1 0.19 nan 0.63 nan nan 0.56 nan
3 0.16 0.34 1 0.92 0.45 0.22 0.72 0.16
4 0.88 nan 0.24 0.30 0.22 0.45 nan 0.32
5 0.83 nan 0.67 0.56 0.89 0.66 0.97 0.27
6 0.82 0.73 0.97 0.78 0.17 0.21 0.91 0.55
7 nan 0.91 0.98 0.90 0.12 0.91 0.79 0.51
以下是我到目前为止的情况:
D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]
paired_patients = {}
psi_sets = {}
psi_sets['d'] = []
psi_sets['n'] = []
for patient in N_col:
patient_id = patient[0:8]
n_id = patient
d_id = [i for i in D_col if patient_id in i]
if len(d_id) > 0:
psi_sets['n'] = df[n_id].to_list()
for d in d_id:
psi_sets['d'] = df[d].to_list()
paired_patients[patient_id] = psi_sets
但是,我的paired_患者
字典值被覆盖而不是追加,因此paired_患者
的输出如下所示:
{'Patient1': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
'Patient3': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]},
'Patient4': {'d': ['nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}
如何修复代码的最后一位,以便为每个患者正确附加paired_patient
字典值,使paired_patient
字典如下所示:
{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}
您可以使用
df.melt
,pd.concat
,series.str.split
,df.replace
,df.groupby
和df.xs
,最后使用df.to dict
。
请查看以下内容:
>>> df2 = (pd.concat([
df.melt().variable.str.split('_', expand=True),
df.melt().drop('variable',1)
], axis=1)
.replace({'Normal':'n', 'Disease':'d'})
.groupby([0,1]).agg(list))
>>> paired_patients = {k: v for k, v in
df2.groupby(level=0)
.apply(lambda df: df.xs(df.name).value.to_dict())
.to_dict().items()
if not ({'d', 'n'} ^ v.keys())}
>>> paired_patients
{'Patient1': {'d': [0.12, 0.06, 0.19, 0.34, 'nan', 'nan', 0.73, 0.91],
'n': ['nan', 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, 'nan']},
'Patient3': {'d': [0.11, 0.45, 'nan', 0.45, 0.22, 0.89, 0.17, 0.12],
'n': [0.21, 0.25,0.63,0.92,0.3, 0.56, 0.78, 0.9]},
'Patient4': {'nan', 'nan', 0.56, 0.72, 'nan', 0.97, 0.91, 0.79],
'n': ['nan', 0.35, 'nan', 0.22, 0.45, 0.66, 0.21, 0.91]}}
解释:
>>> df.melt()
variable value
0 Patient1_Normal NaN
1 Patient1_Normal 0.01
2 Patient1_Normal 0.10
.. ... ...
62 Patient5_Disease 0.55
63 Patient5_Disease 0.51
>>> df.melt().variable.str.split('_', expand=True)
0 1
0 Patient1 Normal
1 Patient1 Normal
2 Patient1 Normal
.. ... ...
62 Patient5 Disease
63 Patient5 Disease
[64 rows x 2 columns]
# then concat these two, replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
df.melt().variable.str.split('_', expand=True),
df.melt().drop('variable',1)
], axis=1).replace({'Normal':'n', 'Disease':'d'})
0 1 value
0 Patient1 n NaN
1 Patient1 n 0.01
2 Patient1 n 0.10
.. ... .. ...
62 Patient5 d 0.55
63 Patient5 d 0.51
[64 rows x 3 columns]
# then groupby column [0, 1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
value
0 1
Patient1 d [0.12, 0.06, 0.19, 0.34, nan, nan, 0.73, 0.91]
n [nan, 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, nan]
Patient2 d [nan, nan, nan, 1.0, 0.24, 0.67, 0.97, 0.98]
Patient3 d [0.11, 0.45, nan, 0.45, 0.22, 0.89, 0.17, 0.12]
n [0.21, 0.25, 0.63, 0.92, 0.3, 0.56, 0.78, 0.9]
Patient4 d [nan, nan, 0.56, 0.72, nan, 0.97, 0.91, 0.79]
n [nan, 0.35, nan, 0.22, 0.45, 0.66, 0.21, 0.91]
Patient5 d [0.34, 0.27, nan, 0.16, 0.32, 0.27, 0.55, 0.51]
# Now groupby level=0, and convert that into dict, and finally check whether
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects
>>> paired_patients = {k: v for k, v in
df2.groupby(level=0)
.apply(lambda df: df.xs(df.name).value.to_dict())
.to_dict().items()
if ('n' in v) and ('d' in v)}
配对患者是一本字典。关键是patient_id,它还指向保存列表的词典。字典需要唯一的键。所以配对患者[patient\u id]['d']是一个可以附加到的列表。您每次都会覆盖“d”键,因为您正在将paired_patients[patient_id]=设置为一个字典,该字典的键与其中已有的键相同。预期的输出是什么?@Sushanth我刚刚编辑了原始帖子以包含我的预期输出!
>>> df.melt()
variable value
0 Patient1_Normal NaN
1 Patient1_Normal 0.01
2 Patient1_Normal 0.10
.. ... ...
62 Patient5_Disease 0.55
63 Patient5_Disease 0.51
>>> df.melt().variable.str.split('_', expand=True)
0 1
0 Patient1 Normal
1 Patient1 Normal
2 Patient1 Normal
.. ... ...
62 Patient5 Disease
63 Patient5 Disease
[64 rows x 2 columns]
# then concat these two, replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
df.melt().variable.str.split('_', expand=True),
df.melt().drop('variable',1)
], axis=1).replace({'Normal':'n', 'Disease':'d'})
0 1 value
0 Patient1 n NaN
1 Patient1 n 0.01
2 Patient1 n 0.10
.. ... .. ...
62 Patient5 d 0.55
63 Patient5 d 0.51
[64 rows x 3 columns]
# then groupby column [0, 1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
value
0 1
Patient1 d [0.12, 0.06, 0.19, 0.34, nan, nan, 0.73, 0.91]
n [nan, 0.01, 0.1, 0.16, 0.88, 0.83, 0.82, nan]
Patient2 d [nan, nan, nan, 1.0, 0.24, 0.67, 0.97, 0.98]
Patient3 d [0.11, 0.45, nan, 0.45, 0.22, 0.89, 0.17, 0.12]
n [0.21, 0.25, 0.63, 0.92, 0.3, 0.56, 0.78, 0.9]
Patient4 d [nan, nan, 0.56, 0.72, nan, 0.97, 0.91, 0.79]
n [nan, 0.35, nan, 0.22, 0.45, 0.66, 0.21, 0.91]
Patient5 d [0.34, 0.27, nan, 0.16, 0.32, 0.27, 0.55, 0.51]
# Now groupby level=0, and convert that into dict, and finally check whether
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects
>>> paired_patients = {k: v for k, v in
df2.groupby(level=0)
.apply(lambda df: df.xs(df.name).value.to_dict())
.to_dict().items()
if ('n' in v) and ('d' in v)}