Python Pandas:更新Pandas数据帧中的列而不使用for循环的更有效方法
我有一个pandas数据框,我想在其中根据数据框中的另一列更新列的值。我以前使用过以下代码来更新它:Python Pandas:更新Pandas数据帧中的列而不使用for循环的更有效方法,python,pandas,Python,Pandas,我有一个pandas数据框,我想在其中根据数据框中的另一列更新列的值。我以前使用过以下代码来更新它: for i1, col1 in dfMod.iterrows(): if col1['day'] == "MONDAY": dfMod.ix[i1,'weekIndex'] = 1 elif col1['day'] == "TUESDAY": dfMod.ix[i1,'weekIndex'] = 2 elif col1['day'] == "WEDNESDAY": dfM
for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
dfMod.ix[i1,'weekIndex'] = 6
else:
dfMod.ix[i1,'weekIndex'] = 7
但是,dataframe有300000行,编译需要花费很长时间。是否有更好的方法更新列?尝试
应用方法:
daysOfWeek = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].apply(lambda x: daysOfWeek[x])
您需要通过dict
:
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
样本:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3,
"THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7
300k
-map
中的计时比应用解决方案快6倍
:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop
In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop
请使用@jezrael的答案,因为它是惯用的。
这纯粹是为了演示和尝试提供有关其他可使用工具的有用信息
设置
使用@jezrael给出的示例
替代解决方案
dfMod.join(pd.Series(d, name='weekIndex'), on='day')
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7
看看系列map
方法。我最近也问了这个问题。我的问题可能对您有用:谢谢,这非常有效,而且正如您所说的,比Apply快得多。我已经在300k行中测试了您的原始解决方案-1个循环,每个循环最好3:21min 47s
dfMod.join(pd.Series(d, name='weekIndex'), on='day')
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7