Pandas 数据帧从列表列len中获取元素>;1并检查标志状态并更新同一行的组id
输入数据帧Pandas 数据帧从列表列len中获取元素>;1并检查标志状态并更新同一行的组id,pandas,pandas-groupby,Pandas,Pandas Groupby,输入数据帧 data = { 'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'], 'Gid' : ['G5','G70','G71','G72','G43','G44','G69','G18','G24','G83','G84','G85','G18','G2','G3','G3'], 'PFlag' : ['Processed','','','','','','','','
data = {
'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'],
'Gid' : ['G5','G70','G71','G72','G43','G44','G69','G18','G24','G83','G84','G85','G18','G2','G3','G3'],
'PFlag' : ['Processed','','','','','','','','Processed','','Processed','','Processed','','','Processed'],
'Flag_list': [['41', '42', '68'],['79'],['80'],['81', '79', '80'],['76'],['77'],['78'],['37', '68', '7'],['48', '41'],['83'],['84'],['85', '83', '84'],['2','33'],['3'],['4','73'],['73']],
'r_id' : ['6','79','80','81','76','77','78','37','48','83','84','85','2','3','4','4']
}
df = pd.DataFrame.from_dict(data)
df
Out[156]:
Id Gid PFlag Flag_list r_id
0 41 G5 Processed [41, 42, 68] 6
1 79 G70 [79] 79
2 80 G71 [80] 80
3 81 G72 [81, 79, 80] 81
4 76 G43 [76] 76
5 77 G44 [77] 77
6 78 G69 [78] 78
7 37 G18 [37, 68, 7] 37
8 48 G24 Processed [48, 41] 48
9 83 G83 [83] 83
10 84 G84 Processed [84] 84
11 85 G85 [85, 83, 84] 85
12 2 G18 Processed [2, 33] 2
13 3 G2 [3] 3
14 4 G3 [4, 73] 4
15 73 G3 Processed [73] 4
Out[157]:
Id Gid PFlag Flag_list r_id
0 41 G5 Processed [41, 42, 68] 6
1 79 G72 Processed [79] 79
2 80 G72 Processed [80] 80
3 81 G72 [81, 79, 80] 81
4 76 G43 [76] 76
5 77 G44 [77] 77
6 78 G69 [78] 78
7 37 G18 [37, 68, 7] 37
8 48 G24 Processed [48, 41] 48
9 83 G85 Processed [83] 83
10 84 G84 Processed [84] 84
11 85 G85 [85, 83, 84] 85
12 2 G18 Processed [2, 33] 2
13 3 G2 [3] 3
14 4 G3 [4, 73] 4
15 73 G3 Processed [73] 4
输出数据帧:
data2 = {
'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'],
'Gid' : ['G5','G72','G72','G72','G43','G44','G69','G18','G24','G85','G84','G85','G18','G2','G3','G3'],
'PFlag' : ['Processed','Processed','Processed','','','','','','Processed','Processed','Processed','','Processed','','','Processed'],
'Flag_list': [['41', '42', '68'],['79'],['80'],['81', '79', '80'],['76'],['77'],['78'],['37', '68', '7'],['48', '41'],['83'],['84'],['85', '83', '84'],['2','33'],['3'],['4','73'],['73']],
'r_id' : ['6','79','80','81','76','77','78','37','48','83','84','85','2','3','4','4']
}
df2 = pd.DataFrame.from_dict(data2)
df2
输出数据帧
data = {
'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'],
'Gid' : ['G5','G70','G71','G72','G43','G44','G69','G18','G24','G83','G84','G85','G18','G2','G3','G3'],
'PFlag' : ['Processed','','','','','','','','Processed','','Processed','','Processed','','','Processed'],
'Flag_list': [['41', '42', '68'],['79'],['80'],['81', '79', '80'],['76'],['77'],['78'],['37', '68', '7'],['48', '41'],['83'],['84'],['85', '83', '84'],['2','33'],['3'],['4','73'],['73']],
'r_id' : ['6','79','80','81','76','77','78','37','48','83','84','85','2','3','4','4']
}
df = pd.DataFrame.from_dict(data)
df
Out[156]:
Id Gid PFlag Flag_list r_id
0 41 G5 Processed [41, 42, 68] 6
1 79 G70 [79] 79
2 80 G71 [80] 80
3 81 G72 [81, 79, 80] 81
4 76 G43 [76] 76
5 77 G44 [77] 77
6 78 G69 [78] 78
7 37 G18 [37, 68, 7] 37
8 48 G24 Processed [48, 41] 48
9 83 G83 [83] 83
10 84 G84 Processed [84] 84
11 85 G85 [85, 83, 84] 85
12 2 G18 Processed [2, 33] 2
13 3 G2 [3] 3
14 4 G3 [4, 73] 4
15 73 G3 Processed [73] 4
Out[157]:
Id Gid PFlag Flag_list r_id
0 41 G5 Processed [41, 42, 68] 6
1 79 G72 Processed [79] 79
2 80 G72 Processed [80] 80
3 81 G72 [81, 79, 80] 81
4 76 G43 [76] 76
5 77 G44 [77] 77
6 78 G69 [78] 78
7 37 G18 [37, 68, 7] 37
8 48 G24 Processed [48, 41] 48
9 83 G85 Processed [83] 83
10 84 G84 Processed [84] 84
11 85 G85 [85, 83, 84] 85
12 2 G18 Processed [2, 33] 2
13 3 G2 [3] 3
14 4 G3 [4, 73] 4
15 73 G3 Processed [73] 4
需要使用大于一个元素列表的Flag_列表列,检查Id列,其中Pflag不等于Processed,并更新组Id,例如,处理第0行,带有79、80的Flag_列表行是单个元素,因此不处理,当81出现时,其列表中有79和80,因此将group_Id G72分配给带有79和80的行,同样,第11行的列表项[85,83,84]84已经处理,因此不对该行执行任何操作,83将G85分配为组id,第7行[68和7]不在id列中,因此保留该行。。
谢谢您……您可以使用以下代码
df['len'] = df['Flag_list'].apply(lambda x: len(x))
sub_df = df[(df['len'] > 1) & (df['PFlag'] != 'Processed')]
for i in range(sub_df.shape[0]):
ids = sub_df['Flag_list'].iloc[i]
cid = sub_df['Id'].iloc[i]
gid = sub_df['Gid'].iloc[i]
for id in ids:
if id != cid:
df.loc[(df['Id'] == id) & (df['PFlag'] != 'Processed'), 'Gid'] = gid
df.loc[(df['Id'] == id) & (df['PFlag'] != 'Processed'), 'PFlag'] = 'Processed'
In [45]: df
Out[45]:
Id Gid PFlag Flag_list r_id len
0 41 G5 Processed [41, 42, 68] 6 3
1 79 G72 Processed [79] 79 1
2 80 G72 Processed [80] 80 1
3 81 G72 [81, 79, 80] 81 3
4 76 G43 [76] 76 1
5 77 G44 [77] 77 1
6 78 G69 [78] 78 1
7 37 G18 [37, 68, 7] 37 3
8 48 G24 Processed [48, 41] 48 2
9 83 G85 Processed [83] 83 1
10 84 G84 Processed [84] 84 1
11 85 G85 [85, 83, 84] 85 3
12 2 G18 Processed [2, 33] 2 2
13 3 G2 [3] 3 1
14 4 G3 [4, 73] 4 2
15 73 G3 Processed [73] 4 1
我已经读过几次了,但仍然不太明白您需要什么,而且我很难看到除已处理更新之外的任何差异?我认为您需要查看数据模型已处理更新基于几个条件,因为标记列表应大于1,并且上面或下面的列表项id不应已处理。什么第'9 83行G85的逻辑是否已处理[83]83`?为什么要添加
处理?新值处理的逻辑是什么?如果Gid
和Id
,那么您可以将列表分解为行,并使用ne
进行转换,以处理抱歉,不理解为什么有时要添加处理,有时是not.现在,再看一遍。@jezraelI修改了它@jezraelWángxiǎochén@王晓晨 当我在我的整个程序中这样做时,在这个操作之后我得到了相同的数据帧,应该如何分配给数据帧。->df.loc[(df['r_id']==id)&(df['PFlag']!='Processed'),'gid']=gid(Pdb)id'4'(Pdb)cid4(Pdb)type(id)(Pdb)type(cid)(Pdb)让我尝试更改类型,并查看我必须添加的id:id=int(id)转换在我的环境中得到这项工作…是的,这需要一些时间来整理…谢谢你。。。