Python 基于其他两列中的字符串创建数据框列
我有一个如下所示的数据框:Python 基于其他两列中的字符串创建数据框列,python,python-3.x,pandas,numpy,dataframe,Python,Python 3.x,Pandas,Numpy,Dataframe,我有一个如下所示的数据框: boat_type boat_type_2 Not Known Not Known Not Known kayak ship Not Known Not Known Not Known ship Not Known boat_type boat_type_2 boat_type_final Not Known Not Known cruise Not Known kayak kayak s
boat_type boat_type_2
Not Known Not Known
Not Known kayak
ship Not Known
Not Known Not Known
ship Not Known
boat_type boat_type_2 boat_type_final
Not Known Not Known cruise
Not Known kayak kayak
ship Not Known ship
Not Known Not Known cruise
ship Not Known ship
我想创建第三列boat\u type\u final
,它应该如下所示:
boat_type boat_type_2
Not Known Not Known
Not Known kayak
ship Not Known
Not Known Not Known
ship Not Known
boat_type boat_type_2 boat_type_final
Not Known Not Known cruise
Not Known kayak kayak
ship Not Known ship
Not Known Not Known cruise
ship Not Known ship
因此,基本上,如果船型
和船型
中都存在“未知”,则该值应为“巡航”。但是,如果前两列中有“未知”以外的字符串,则应使用该字符串填写boat_type_final
,即“kayak”或“ship”
最优雅的方式是什么?我看到了一些选项,如where
、创建函数和/或逻辑,我想知道一个真正的pythonista会做什么
以下是我目前的代码:
import pandas as pd
import numpy as np
data = [{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
{'boat_type': 'Not Known', 'boat_type_2': 'kayak'},
{'boat_type': 'ship', 'boat_type_2': 'Not Known'},
{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
{'boat_type': 'ship', 'boat_type_2': 'Not Known'}]
df = pd.DataFrame(data
df['phone_type_final'] = np.where(df.phone_type.str.contains('Not'))...
使用:
说明:
第一个未知
到缺少的值:
print (df.replace('Not Known',np.nan))
boat_type boat_type_2
0 NaN NaN
1 NaN kayak
2 ship NaN
3 NaN NaN
4 ship NaN
然后通过按行向前填充替换NaN
s:
print (df.replace('Not Known',np.nan).ffill(axis=1))
boat_type boat_type_2
0 NaN NaN
1 NaN kayak
2 ship ship
3 NaN NaN
4 ship ship
按位置选择最后一列:
如果可能,请添加:
如果只使用了几列,另一种解决方案是:
另一种解决方案是定义具有映射的函数:
def my_func(row):
if row['boat_type']!='Not Known':
return row['boat_type']
elif row['boat_type_2']!='Not Known':
return row['boat_type_2']
else:
return 'cruise'
[注意:您没有提到当两列都不“未知”时应该发生什么。]
然后简单地应用函数:
df.loc[:,'boat_type_final'] = df.apply(my_func, axis=1)
print(df)
输出:
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship
你能解释一下它是如何工作的吗?特别是这部分:
.ffill(axis=1).iloc[:,-1]
@bzier-好的,给我一点时间。@bzier-答案被修改了。
df.loc[:,'boat_type_final'] = df.apply(my_func, axis=1)
print(df)
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship