Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/jpa/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 数据帧从列表列len中获取元素>;1并检查标志状态并更新同一行的组id_Pandas_Pandas Groupby - Fatal编程技术网

Pandas 数据帧从列表列len中获取元素>;1并检查标志状态并更新同一行的组id

Pandas 数据帧从列表列len中获取元素>;1并检查标志状态并更新同一行的组id,pandas,pandas-groupby,Pandas,Pandas Groupby,输入数据帧 data = { 'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'], 'Gid' : ['G5','G70','G71','G72','G43','G44','G69','G18','G24','G83','G84','G85','G18','G2','G3','G3'], 'PFlag' : ['Processed','','','','','','','','

输入数据帧

data = {
'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'],
'Gid' : ['G5','G70','G71','G72','G43','G44','G69','G18','G24','G83','G84','G85','G18','G2','G3','G3'],
'PFlag' : ['Processed','','','','','','','','Processed','','Processed','','Processed','','','Processed'],
'Flag_list': [['41', '42', '68'],['79'],['80'],['81', '79', '80'],['76'],['77'],['78'],['37', '68', '7'],['48', '41'],['83'],['84'],['85', '83', '84'],['2','33'],['3'],['4','73'],['73']],
'r_id' : ['6','79','80','81','76','77','78','37','48','83','84','85','2','3','4','4']
}
df = pd.DataFrame.from_dict(data)
df
Out[156]: 
    Id  Gid      PFlag     Flag_list r_id
0   41   G5  Processed  [41, 42, 68]    6
1   79  G70                     [79]   79
2   80  G71                     [80]   80
3   81  G72             [81, 79, 80]   81
4   76  G43                     [76]   76
5   77  G44                     [77]   77
6   78  G69                     [78]   78
7   37  G18              [37, 68, 7]   37
8   48  G24  Processed      [48, 41]   48
9   83  G83                     [83]   83
10  84  G84  Processed          [84]   84
11  85  G85             [85, 83, 84]   85
12   2  G18  Processed       [2, 33]    2
13   3   G2                      [3]    3
14   4   G3                  [4, 73]    4
15  73   G3  Processed          [73]    4
 Out[157]: 
        Id  Gid      PFlag     Flag_list r_id
    0   41   G5  Processed  [41, 42, 68]    6
    1   79  G72  Processed          [79]   79
    2   80  G72  Processed          [80]   80
    3   81  G72             [81, 79, 80]   81
    4   76  G43                     [76]   76
    5   77  G44                     [77]   77
    6   78  G69                     [78]   78
    7   37  G18              [37, 68, 7]   37
    8   48  G24  Processed      [48, 41]   48
    9   83  G85  Processed          [83]   83
    10  84  G84  Processed          [84]   84
    11  85  G85             [85, 83, 84]   85
    12   2  G18  Processed       [2, 33]    2
    13   3   G2                      [3]    3
    14   4   G3                  [4, 73]    4
    15  73   G3  Processed          [73]    4
输出数据帧:

data2 = {
'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'],
'Gid' : ['G5','G72','G72','G72','G43','G44','G69','G18','G24','G85','G84','G85','G18','G2','G3','G3'],
'PFlag' : ['Processed','Processed','Processed','','','','','','Processed','Processed','Processed','','Processed','','','Processed'],
'Flag_list': [['41', '42', '68'],['79'],['80'],['81', '79', '80'],['76'],['77'],['78'],['37', '68', '7'],['48', '41'],['83'],['84'],['85', '83', '84'],['2','33'],['3'],['4','73'],['73']],
'r_id' : ['6','79','80','81','76','77','78','37','48','83','84','85','2','3','4','4']
}

df2 = pd.DataFrame.from_dict(data2)
df2
输出数据帧

data = {
'Id' : ['41','79','80','81','76','77','78','37','48','83','84','85','2','3','4','73'],
'Gid' : ['G5','G70','G71','G72','G43','G44','G69','G18','G24','G83','G84','G85','G18','G2','G3','G3'],
'PFlag' : ['Processed','','','','','','','','Processed','','Processed','','Processed','','','Processed'],
'Flag_list': [['41', '42', '68'],['79'],['80'],['81', '79', '80'],['76'],['77'],['78'],['37', '68', '7'],['48', '41'],['83'],['84'],['85', '83', '84'],['2','33'],['3'],['4','73'],['73']],
'r_id' : ['6','79','80','81','76','77','78','37','48','83','84','85','2','3','4','4']
}
df = pd.DataFrame.from_dict(data)
df
Out[156]: 
    Id  Gid      PFlag     Flag_list r_id
0   41   G5  Processed  [41, 42, 68]    6
1   79  G70                     [79]   79
2   80  G71                     [80]   80
3   81  G72             [81, 79, 80]   81
4   76  G43                     [76]   76
5   77  G44                     [77]   77
6   78  G69                     [78]   78
7   37  G18              [37, 68, 7]   37
8   48  G24  Processed      [48, 41]   48
9   83  G83                     [83]   83
10  84  G84  Processed          [84]   84
11  85  G85             [85, 83, 84]   85
12   2  G18  Processed       [2, 33]    2
13   3   G2                      [3]    3
14   4   G3                  [4, 73]    4
15  73   G3  Processed          [73]    4
 Out[157]: 
        Id  Gid      PFlag     Flag_list r_id
    0   41   G5  Processed  [41, 42, 68]    6
    1   79  G72  Processed          [79]   79
    2   80  G72  Processed          [80]   80
    3   81  G72             [81, 79, 80]   81
    4   76  G43                     [76]   76
    5   77  G44                     [77]   77
    6   78  G69                     [78]   78
    7   37  G18              [37, 68, 7]   37
    8   48  G24  Processed      [48, 41]   48
    9   83  G85  Processed          [83]   83
    10  84  G84  Processed          [84]   84
    11  85  G85             [85, 83, 84]   85
    12   2  G18  Processed       [2, 33]    2
    13   3   G2                      [3]    3
    14   4   G3                  [4, 73]    4
    15  73   G3  Processed          [73]    4
需要使用大于一个元素列表的Flag_列表列,检查Id列,其中Pflag不等于Processed,并更新组Id,例如,处理第0行,带有79、80的Flag_列表行是单个元素,因此不处理,当81出现时,其列表中有79和80,因此将group_Id G72分配给带有79和80的行,同样,第11行的列表项[85,83,84]84已经处理,因此不对该行执行任何操作,83将G85分配为组id,第7行[68和7]不在id列中,因此保留该行。。
谢谢您……

您可以使用以下代码

df['len'] = df['Flag_list'].apply(lambda x: len(x))
sub_df = df[(df['len'] > 1) & (df['PFlag'] != 'Processed')]
for i in range(sub_df.shape[0]):
    ids = sub_df['Flag_list'].iloc[i]
    cid = sub_df['Id'].iloc[i]
    gid = sub_df['Gid'].iloc[i]
    for id in ids:
        if id != cid:
            df.loc[(df['Id'] == id) & (df['PFlag'] != 'Processed'), 'Gid'] = gid
            df.loc[(df['Id'] == id) & (df['PFlag'] != 'Processed'), 'PFlag'] = 'Processed'


In [45]: df
Out[45]: 
    Id  Gid      PFlag     Flag_list r_id  len
0   41   G5  Processed  [41, 42, 68]    6    3
1   79  G72  Processed          [79]   79    1
2   80  G72  Processed          [80]   80    1
3   81  G72             [81, 79, 80]   81    3
4   76  G43                     [76]   76    1
5   77  G44                     [77]   77    1
6   78  G69                     [78]   78    1
7   37  G18              [37, 68, 7]   37    3
8   48  G24  Processed      [48, 41]   48    2
9   83  G85  Processed          [83]   83    1
10  84  G84  Processed          [84]   84    1
11  85  G85             [85, 83, 84]   85    3
12   2  G18  Processed       [2, 33]    2    2
13   3   G2                      [3]    3    1
14   4   G3                  [4, 73]    4    2
15  73   G3  Processed          [73]    4    1




我已经读过几次了,但仍然不太明白您需要什么,而且我很难看到除已处理更新之外的任何差异?我认为您需要查看数据模型已处理更新基于几个条件,因为标记列表应大于1,并且上面或下面的列表项id不应已处理。什么第'9 83行G85的逻辑是否已处理[83]83`?为什么要添加
处理
?新值
处理
的逻辑是什么?如果
Gid
Id
,那么您可以将列表分解为行,并使用
ne
进行
转换
,以处理抱歉,不理解为什么有时要添加
处理
,有时是not.现在,再看一遍。@jezraelI修改了它@jezraelWángxiǎochén@王晓晨 当我在我的整个程序中这样做时,在这个操作之后我得到了相同的数据帧,应该如何分配给数据帧。->df.loc[(df['r_id']==id)&(df['PFlag']!='Processed'),'gid']=gid(Pdb)id'4'(Pdb)cid4(Pdb)type(id)(Pdb)type(cid)(Pdb)让我尝试更改类型,并查看我必须添加的id:id=int(id)转换在我的环境中得到这项工作…是的,这需要一些时间来整理…谢谢你。。。