Python 基于现有列的值向dataframe添加新列

Python 基于现有列的值向dataframe添加新列,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个熊猫数据框: Index Resource 2020-07-15 11:59:02 Monkey 2020-07-16 11:59:02 Helicopter 2020-07-17 11:59:02 Forklift 2020-07-18 11:59:02 Airplane 2020-07-19 11:59:02 Dinosaur 2020-07-20 11:59:02 Drone 2020-07-2

我有这样一个熊猫数据框:

Index                   Resource
2020-07-15 11:59:02     Monkey
2020-07-16 11:59:02     Helicopter
2020-07-17 11:59:02     Forklift
2020-07-18 11:59:02     Airplane
2020-07-19 11:59:02     Dinosaur
2020-07-20 11:59:02     Drone
2020-07-20 11:59:02     Truck
2020-07-20 11:59:02     Airplane
2020-07-22 11:59:02     Truck
2020-07-22 11:59:02     Transport
2020-07-23 11:59:02     Dozer
2020-07-24 11:59:02     Patrol
2020-07-25 11:59:02     Dinosaur
Index                   Resource      Category
2020-07-15 11:59:02     Monkey        Other
2020-07-16 11:59:02     Helicopter    Aviation
2020-07-17 11:59:02     Forklift      Equipment
2020-07-18 11:59:02     Airplane      Aviation
2020-07-19 11:59:02     Dinosaur      Other
2020-07-20 11:59:02     Drone         Aviation
2020-07-20 11:59:02     Truck         Equipment
2020-07-20 11:59:02     Airplane      Aviation
2020-07-22 11:59:02     Truck         Equipment
2020-07-22 11:59:02     Transport     Crew
2020-07-23 11:59:02     Dozer         Equipment
2020-07-24 11:59:02     Patrol        Crew
2020-07-25 11:59:02     Dinosaur      Other
我想添加一个名为“Category”的新列,如下所示:

Index                   Resource
2020-07-15 11:59:02     Monkey
2020-07-16 11:59:02     Helicopter
2020-07-17 11:59:02     Forklift
2020-07-18 11:59:02     Airplane
2020-07-19 11:59:02     Dinosaur
2020-07-20 11:59:02     Drone
2020-07-20 11:59:02     Truck
2020-07-20 11:59:02     Airplane
2020-07-22 11:59:02     Truck
2020-07-22 11:59:02     Transport
2020-07-23 11:59:02     Dozer
2020-07-24 11:59:02     Patrol
2020-07-25 11:59:02     Dinosaur
Index                   Resource      Category
2020-07-15 11:59:02     Monkey        Other
2020-07-16 11:59:02     Helicopter    Aviation
2020-07-17 11:59:02     Forklift      Equipment
2020-07-18 11:59:02     Airplane      Aviation
2020-07-19 11:59:02     Dinosaur      Other
2020-07-20 11:59:02     Drone         Aviation
2020-07-20 11:59:02     Truck         Equipment
2020-07-20 11:59:02     Airplane      Aviation
2020-07-22 11:59:02     Truck         Equipment
2020-07-22 11:59:02     Transport     Crew
2020-07-23 11:59:02     Dozer         Equipment
2020-07-24 11:59:02     Patrol        Crew
2020-07-25 11:59:02     Dinosaur      Other
…可能基于是否在以下列表中找到“资源”的值:

aviation_list = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
equipment_list = ['Truck', 'Dozer', 'Forklift', 'Excavator']
crew_list = ['Transport', 'Patrol', 'Stationary']
因此,如果在定义的列表中找不到“Resource”的值,则新列“Category”的值将默认为“Other”;否则,“类别”分别为“航空”、“设备”或“机组”。(每个“资源”仅属于一个“类别”。)


我相信在熊猫身上一定有一种优雅的方式来做到这一点。有人能提供建议吗?

您可以创建一个函数,该函数接受
资源
值并给出
类别

def get_类别(资源):
航空列表=集合([‘飞机’、‘直升机’、‘无人机’、‘降落伞’]))
设备清单=成套设备(卡车、推土机、叉车、挖掘机)
机组人员名单=集合([‘运输’、‘巡逻’、‘固定’)
如果资源在航空_列表中:
返回“航空”
设备清单中的elif资源:
返回“设备”
船员名单中的elif资源:
返回“机组人员”
其他:
返回“其他”
然后,您可以使用以下内容创建新列

#加载数据
作为pd进口熊猫
df=pd.read_剪贴板()#从上面复制
df['Category']=[获取df['resource']中资源的类别(资源)]
这就产生了

In [9]: df
Out[9]:
               Index    Resource   Category
2020-07-15  11:59:02      Monkey      Other
2020-07-16  11:59:02  Helicopter   Aviation
2020-07-17  11:59:02    Forklift  Equipment
2020-07-18  11:59:02    Airplane   Aviation
2020-07-19  11:59:02    Dinosaur      Other
2020-07-20  11:59:02       Drone   Aviation
2020-07-20  11:59:02       Truck  Equipment
2020-07-20  11:59:02    Airplane   Aviation
2020-07-22  11:59:02       Truck  Equipment
2020-07-22  11:59:02   Transport       Crew
2020-07-23  11:59:02       Dozer  Equipment
2020-07-24  11:59:02      Patrol       Crew
2020-07-25  11:59:02    Dinosaur      Other

快速注释。。。我假设每个
资源
只能属于一个类别,因此我只需获取我找到的第一个匹配值

使用
映射
创建类别值,并使用
.fillna
处理任何列表中没有的任何内容。首先,我们需要创建字典:

d = {resource: category 
     for category, lst in zip(['Aviation', 'Equipment', 'Crew'], [aviation_list, equipment_list, crew_list])
     for resource in lst}

df['Category'] = df['Resource'].map(d).fillna('Other')


您可以创建列表字典

d = {}
d['Aviation'] = ['Airplane', 'Helicopter', 'Drone', 'Parachute']
d['Equipment'] = ['Truck', 'Dozer', 'Forklift', 'Excavator']
d['Crew'] = ['Transport', 'Patrol', 'Stationary']
创建一个接受值并返回类别的函数

def final_pop(resource):
   if resource in d['Aviation']:
      return "Aviation"
   elif resource in d['Equipment']:
      return "Equipment"
   elif resource in d['Crew']:
      return "Crew"
   else:
      return "Others"

df['Category'] = df.apply(lambda row: final_pop(row['Resource']),axis=1)

是否保证每个资源只属于一个类别?(也许没关系?)是的,每个“资源”只属于一个“类别”。