Python 根据表中的其他列添加列并填充缺少的值
对于以下输入数据,我需要填写缺失的Python 根据表中的其他列添加列并填充缺少的值,python,pandas,Python,Pandas,对于以下输入数据,我需要填写缺失的office\u编号s,并创建一列,以区分office\u编号是原始的还是之后填写的 以下是示例数据: df = pd.DataFrame({'id':['1010084420','1010084420','1010084420','1010084421','1010084421','1010084421','1010084425'], 'building_name': ['A', 'A', 'A', 'East Tower
office\u编号
s,并创建一列,以区分office\u编号
是原始的还是之后填写的
以下是示例数据:
df = pd.DataFrame({'id':['1010084420','1010084420','1010084420','1010084421','1010084421','1010084421','1010084425'],
'building_name': ['A', 'A', 'A', 'East Tower', 'East Tower', 'West Tower', 'T1'],
'floor': ['1', '1', '2', '10', '10', '11','11'],
'office_number':['', '','205','','','', '1101-1105'],
'company_name': ['Ariel Resources Ltd.', 'A.O. Tatneft', '', 'Agrium Inc.', 'Creo Products Inc.', 'Cott Corp.', 'Creo Products Inc.']})
print(df)
输出:
id building_name floor office_number company_name
0 1010084420 A 1 Ariel Resources Ltd.
1 1010084420 A 1 A.O. Tatneft
2 1010084420 A 2 205
3 1010084421 East Tower 10 Agrium Inc.
4 1010084421 East Tower 10 Creo Products Inc.
5 1010084421 West Tower 11 Cott Corp.
6 1010084425 T1 11 1101-1105 Creo Products Inc.
对于相同的id
、building\u name
和floor
的办公室,当office\u编号
为空时,我需要使用以下规则填写floor值
+F
+001、002、003等
;并创建一列office\u num\u status
,当该列不为空时,插入original
,否则填写
这是最终的预期结果:
id building_name floor office_num_status office_number \
0 1010084420 A 1 filled 1F001
1 1010084420 A 1 filled 1F002
2 1010084420 A 2 original 205
3 1010084421 East Tower 10 filled 10F001
4 1010084421 East Tower 10 filled 10F002
5 1010084421 West Tower 11 filled 11F001
6 1010084425 T1 11 original 1101-1105
company_name
0 Ariel Resources Ltd.
1 A.O. Tatneft
2
3 Agrium Inc.
4 Creo Products Inc.
5 Cott Corp.
6 Creo Products Inc.
到目前为止,我所做的是创建列office\u num\u status
,但所有值都是original
s:
# method 1
df['office_num_status'] = np.where(df['office_number'].isnull(), 'filled', 'original')
# method 2
df['office_num_status'] = ['filled' if x is None else 'original' for x in df['office_number']]
# method 3
df['office_num_status'] = 'filled'
df.loc[df['office_number'] is not None, 'office_num_status'] = 'original'
有人能帮我完成这个吗?非常感谢。比较缺少的字符串而不是缺少的值,通过添加计数器并填充不存在的值:
mask = df['office_number'] == ''
df.insert(3, 'office_num_status', np.where(mask, 'filled', 'original'))
s = df.groupby(['id','building_name','floor']).cumcount().add(1).astype(str).str.zfill(3)
df.loc[mask, 'office_number'] = df['floor'].astype(str) + 'F' + s
print (df)
id building_name floor office_num_status office_number \
0 1010084420 A 1 filled 1F001
1 1010084420 A 1 filled 1F002
2 1010084420 A 2 original 205
3 1010084421 East Tower 10 filled 10F001
4 1010084421 East Tower 10 filled 10F002
5 1010084421 West Tower 11 filled 11F001
6 1010084425 T1 11 original 1101-1105
company_name
0 Ariel Resources Ltd.
1 A.O. Tatneft
2
3 Agrium Inc.
4 Creo Products Inc.
5 Cott Corp.
6 Creo Products Inc.
df.insert(3,'office\u num\u status',np.where(mask,'filled','original'))
,我可以问一下,3在这里代表什么吗?@ahbon-它的意思是第3列-好的,我知道了。另一个问题:isnull()
,是None
,==”
,为什么前两个在这种情况下不起作用?@ahbon-这取决于数据。如果没有数据,它应该是NaN-然后需要isnull
或isna
,如果empy字符串需要='
。