Python 基于现有列的值向dataframe添加新列
我有这样一个熊猫数据框:Python 基于现有列的值向dataframe添加新列,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个熊猫数据框: Index Resource 2020-07-15 11:59:02 Monkey 2020-07-16 11:59:02 Helicopter 2020-07-17 11:59:02 Forklift 2020-07-18 11:59:02 Airplane 2020-07-19 11:59:02 Dinosaur 2020-07-20 11:59:02 Drone 2020-07-2
Index Resource
2020-07-15 11:59:02 Monkey
2020-07-16 11:59:02 Helicopter
2020-07-17 11:59:02 Forklift
2020-07-18 11:59:02 Airplane
2020-07-19 11:59:02 Dinosaur
2020-07-20 11:59:02 Drone
2020-07-20 11:59:02 Truck
2020-07-20 11:59:02 Airplane
2020-07-22 11:59:02 Truck
2020-07-22 11:59:02 Transport
2020-07-23 11:59:02 Dozer
2020-07-24 11:59:02 Patrol
2020-07-25 11:59:02 Dinosaur
Index Resource Category
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
我想添加一个名为“Category”的新列,如下所示:
Index Resource
2020-07-15 11:59:02 Monkey
2020-07-16 11:59:02 Helicopter
2020-07-17 11:59:02 Forklift
2020-07-18 11:59:02 Airplane
2020-07-19 11:59:02 Dinosaur
2020-07-20 11:59:02 Drone
2020-07-20 11:59:02 Truck
2020-07-20 11:59:02 Airplane
2020-07-22 11:59:02 Truck
2020-07-22 11:59:02 Transport
2020-07-23 11:59:02 Dozer
2020-07-24 11:59:02 Patrol
2020-07-25 11:59:02 Dinosaur
Index Resource Category
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
…可能基于是否在以下列表中找到“资源”的值:
aviation_list = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
equipment_list = ['Truck', 'Dozer', 'Forklift', 'Excavator']
crew_list = ['Transport', 'Patrol', 'Stationary']
因此,如果在定义的列表中找不到“Resource”的值,则新列“Category”的值将默认为“Other”;否则,“类别”分别为“航空”、“设备”或“机组”。(每个“资源”仅属于一个“类别”。)
我相信在熊猫身上一定有一种优雅的方式来做到这一点。有人能提供建议吗?您可以创建一个函数,该函数接受
资源
值并给出类别
def get_类别(资源):
航空列表=集合([‘飞机’、‘直升机’、‘无人机’、‘降落伞’]))
设备清单=成套设备(卡车、推土机、叉车、挖掘机)
机组人员名单=集合([‘运输’、‘巡逻’、‘固定’)
如果资源在航空_列表中:
返回“航空”
设备清单中的elif资源:
返回“设备”
船员名单中的elif资源:
返回“机组人员”
其他:
返回“其他”
然后,您可以使用以下内容创建新列
#加载数据
作为pd进口熊猫
df=pd.read_剪贴板()#从上面复制
df['Category']=[获取df['resource']中资源的类别(资源)]
这就产生了
In [9]: df
Out[9]:
Index Resource Category
2020-07-15 11:59:02 Monkey Other
2020-07-16 11:59:02 Helicopter Aviation
2020-07-17 11:59:02 Forklift Equipment
2020-07-18 11:59:02 Airplane Aviation
2020-07-19 11:59:02 Dinosaur Other
2020-07-20 11:59:02 Drone Aviation
2020-07-20 11:59:02 Truck Equipment
2020-07-20 11:59:02 Airplane Aviation
2020-07-22 11:59:02 Truck Equipment
2020-07-22 11:59:02 Transport Crew
2020-07-23 11:59:02 Dozer Equipment
2020-07-24 11:59:02 Patrol Crew
2020-07-25 11:59:02 Dinosaur Other
快速注释。。。我假设每个
资源
只能属于一个类别,因此我只需获取我找到的第一个匹配值使用映射
创建类别值,并使用.fillna
处理任何列表中没有的任何内容。首先,我们需要创建字典:
d = {resource: category
for category, lst in zip(['Aviation', 'Equipment', 'Crew'], [aviation_list, equipment_list, crew_list])
for resource in lst}
df['Category'] = df['Resource'].map(d).fillna('Other')
您可以创建列表字典
d = {}
d['Aviation'] = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
d['Equipment'] = ['Truck', 'Dozer', 'Forklift', 'Excavator']
d['Crew'] = ['Transport', 'Patrol', 'Stationary']
创建一个接受值并返回类别的函数
def final_pop(resource):
if resource in d['Aviation']:
return "Aviation"
elif resource in d['Equipment']:
return "Equipment"
elif resource in d['Crew']:
return "Crew"
else:
return "Others"
df['Category'] = df.apply(lambda row: final_pop(row['Resource']),axis=1)
是否保证每个资源只属于一个类别?(也许没关系?)是的,每个“资源”只属于一个“类别”。