Python 检查元素是否在列表中,若条件满足,则写入dataframe中的新列
查看熊猫数据框,其中包含过去150年来所有奥运会运动员的信息(姓名、体重、国家、运动项目等)。可于 尝试创建一个for循环,循环遍历df行,根据多个列表检查存储在“Sport”列中的值,然后向df添加一个列,其中父类别位于同一行中。迄今为止的代码:Python 检查元素是否在列表中,若条件满足,则写入dataframe中的新列,python,pandas,dataframe,for-loop,if-statement,Python,Pandas,Dataframe,For Loop,If Statement,查看熊猫数据框,其中包含过去150年来所有奥运会运动员的信息(姓名、体重、国家、运动项目等)。可于 尝试创建一个for循环,循环遍历df行,根据多个列表检查存储在“Sport”列中的值,然后向df添加一个列,其中父类别位于同一行中。迄今为止的代码: aquatic_sports = ['Swimming','Diving','Synchronized Swimming','Water Polo'] track_sports = ['Athletics','Modern Pentathlon',
aquatic_sports = ['Swimming','Diving','Synchronized Swimming','Water Polo']
track_sports = ['Athletics','Modern Pentathlon','Triathlon','Biathlon','Cycling']
team_sports = ['Softball','Basketball','Volleyball','Beach Volleyball','Handball','Rugby','Lacrosse']
gymnastic_sports = ['Gymnastics','Rhytmic Gymnastics','Trampolining']
fitness_sports = ['Weightlifting']
combat_sports = ['Boxing','Judo','Wrestling','Taekwondo']
winter_sports = ['Short Track Speed Skating','Ski Jumping','Cross Country Skiing','Luge','Bobsleigh','Alpine Skiing','Curling','Snowboarding','Ice Hocky','Hockey','Speed Skating']
for index, row in df.iterrows():
if df.iloc[0,11] in aquatic_sports:
df['Sport Category'] = 'Aquatic Sport'
elif df.iloc[0,11] in track_sports:
df['Sport Category'] = 'Track Sport'
elif df.iloc[0,11] in gymnastic_sports:
df['Sport Category'] = 'Gymnastic Sport'
elif df.iloc[0,11] in fitness_sports:
df['Sport Category'] = 'Fitness Sport'
elif df.iloc[0,11] in combat_sports:
df['Sport Category'] = 'Combat Sport'
elif df.iloc[0,11] in winter_sports:
df['Sport Category'] = 'Winter Sport'
没有抛出错误,但很遗憾,新列中的所有值都相同。不确定如何传递当前索引以确保每次迭代返回唯一、正确的值。这是一个问题,但我们需要创建适当的字典。由于您已经在单独的变量中创建了列表,因此我们可以将它们存储在字典中,并使用您想要的标签作为键:
d = {
'Aquatic Sport': ['Swimming', 'Diving','Synchronized Swimming', 'Water Polo'],
'Track Sports': ['Athletics','Modern Pentathlon', 'Triathlon', 'Biathlon', 'Cycling'],
'Team Sport': ['Softball', 'Basketball', 'Volleyball', 'Beach Volleyball',
'Handball', 'Rugby', 'Lacrosse'],
'Gymnastic Sport': ['Gymnastics', 'Rhytmic Gymnastics', 'Trampolining'],
'Fitness Sport': ['Weightlifting'],
'Combat Sport': ['Boxing','Judo', 'Wrestling', 'Taekwondo'],
'Winter Sport': ['Short Track Speed Skating', 'Ski Jumping', 'Cross Country Skiing',
'Luge','Bobsleigh', 'Alpine Skiing', 'Curling', 'Snowboarding',
'Ice Hockey', 'Hockey', 'Speed Skating']
}
# unpacks lists so it's {sport: category_label}
d = {sport: cat for cat,l in d.items() for sport in l}
df['Sport Category'] = df['Sport'].map(d)
列中出现相同值的原因是,无论何时执行
df['Sport Category']=
操作,都会将整个列设置为该值。在您的代码中,基本上该列会被更新多次,但保留最后设置的值
在设置值时,您可以尝试
df.ix[0,'Sport Category']=
查看设置是否有效。很遗憾,设置无效。返回了一个包含所有“冬季运动”值的专栏…这很有效,非常感谢!运行速度也比for循环快得多。