Python 检查元素是否在列表中,若条件满足,则写入dataframe中的新列

Python 检查元素是否在列表中,若条件满足,则写入dataframe中的新列,python,pandas,dataframe,for-loop,if-statement,Python,Pandas,Dataframe,For Loop,If Statement,查看熊猫数据框,其中包含过去150年来所有奥运会运动员的信息(姓名、体重、国家、运动项目等)。可于 尝试创建一个for循环,循环遍历df行,根据多个列表检查存储在“Sport”列中的值,然后向df添加一个列,其中父类别位于同一行中。迄今为止的代码: aquatic_sports = ['Swimming','Diving','Synchronized Swimming','Water Polo'] track_sports = ['Athletics','Modern Pentathlon',

查看熊猫数据框,其中包含过去150年来所有奥运会运动员的信息(姓名、体重、国家、运动项目等)。可于

尝试创建一个for循环,循环遍历df行,根据多个列表检查存储在“Sport”列中的值,然后向df添加一个列,其中父类别位于同一行中。迄今为止的代码:

aquatic_sports = ['Swimming','Diving','Synchronized Swimming','Water Polo']
track_sports = ['Athletics','Modern Pentathlon','Triathlon','Biathlon','Cycling']
team_sports = ['Softball','Basketball','Volleyball','Beach Volleyball','Handball','Rugby','Lacrosse']
gymnastic_sports = ['Gymnastics','Rhytmic Gymnastics','Trampolining']
fitness_sports = ['Weightlifting']
combat_sports = ['Boxing','Judo','Wrestling','Taekwondo']
winter_sports = ['Short Track Speed Skating','Ski Jumping','Cross Country Skiing','Luge','Bobsleigh','Alpine Skiing','Curling','Snowboarding','Ice Hocky','Hockey','Speed Skating']

for index, row in df.iterrows():

    if df.iloc[0,11] in aquatic_sports:

        df['Sport Category'] = 'Aquatic Sport'

    elif df.iloc[0,11] in track_sports:

        df['Sport Category'] = 'Track Sport'

    elif df.iloc[0,11] in gymnastic_sports:

        df['Sport Category'] = 'Gymnastic Sport'

    elif df.iloc[0,11] in fitness_sports:

        df['Sport Category'] = 'Fitness Sport'

    elif df.iloc[0,11] in combat_sports:

        df['Sport Category'] = 'Combat Sport'

    elif df.iloc[0,11] in winter_sports:

        df['Sport Category'] = 'Winter Sport'
没有抛出错误,但很遗憾,新列中的所有值都相同。不确定如何传递当前索引以确保每次迭代返回唯一、正确的值。

这是一个问题,但我们需要创建适当的字典。由于您已经在单独的变量中创建了列表,因此我们可以将它们存储在字典中,并使用您想要的标签作为键:

d = {
    'Aquatic Sport': ['Swimming', 'Diving','Synchronized Swimming', 'Water Polo'],
    'Track Sports': ['Athletics','Modern Pentathlon', 'Triathlon', 'Biathlon', 'Cycling'],
    'Team Sport': ['Softball', 'Basketball', 'Volleyball', 'Beach Volleyball',
                   'Handball', 'Rugby', 'Lacrosse'],
    'Gymnastic Sport': ['Gymnastics', 'Rhytmic Gymnastics', 'Trampolining'],
    'Fitness Sport': ['Weightlifting'],
    'Combat Sport': ['Boxing','Judo', 'Wrestling', 'Taekwondo'],
    'Winter Sport': ['Short Track Speed Skating', 'Ski Jumping', 'Cross Country Skiing',
                     'Luge','Bobsleigh', 'Alpine Skiing', 'Curling', 'Snowboarding',
                     'Ice Hockey', 'Hockey', 'Speed Skating']
    }

# unpacks lists so it's {sport: category_label}
d = {sport: cat for cat,l in d.items() for sport in l}
df['Sport Category'] = df['Sport'].map(d)

列中出现相同值的原因是,无论何时执行
df['Sport Category']=
操作,都会将整个列设置为该值。在您的代码中,基本上该列会被更新多次,但保留最后设置的值

在设置值时,您可以尝试
df.ix[0,'Sport Category']=
查看设置是否有效。

很遗憾,设置无效。返回了一个包含所有“冬季运动”值的专栏…这很有效,非常感谢!运行速度也比for循环快得多。