Python Pandas:更新Pandas数据帧中的列而不使用for循环的更有效方法

Python Pandas:更新Pandas数据帧中的列而不使用for循环的更有效方法,python,pandas,Python,Pandas,我有一个pandas数据框,我想在其中根据数据框中的另一列更新列的值。我以前使用过以下代码来更新它: for i1, col1 in dfMod.iterrows(): if col1['day'] == "MONDAY": dfMod.ix[i1,'weekIndex'] = 1 elif col1['day'] == "TUESDAY": dfMod.ix[i1,'weekIndex'] = 2 elif col1['day'] == "WEDNESDAY": dfM

我有一个pandas数据框,我想在其中根据数据框中的另一列更新列的值。我以前使用过以下代码来更新它:

for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
    dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
    dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
    dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
    dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
    dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
    dfMod.ix[i1,'weekIndex'] = 6
else:
    dfMod.ix[i1,'weekIndex'] = 7

但是,dataframe有300000行,编译需要花费很长时间。是否有更好的方法更新列?

尝试
应用
方法:

daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x])
您需要通过
dict

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)
样本:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
     "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7
300k
-
map
中的计时比
应用
解决方案快6倍

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop

In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop

请使用@jezrael的答案,因为它是惯用的。
这纯粹是为了演示和尝试提供有关其他可使用工具的有用信息

设置
使用@jezrael给出的示例

替代解决方案

dfMod.join(pd.Series(d, name='weekIndex'), on='day')

        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7

看看系列
map
方法。我最近也问了这个问题。我的问题可能对您有用:谢谢,这非常有效,而且正如您所说的,比Apply快得多。我已经在300k行中测试了您的原始解决方案-
1个循环,每个循环最好3:21min 47s
dfMod.join(pd.Series(d, name='weekIndex'), on='day')

        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7