Python loc比向后循环查找数据帧中的第一个匹配项快吗?
假设我有一个大型数据框,其中包含如下列:Python loc比向后循环查找数据帧中的第一个匹配项快吗?,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有一个大型数据框,其中包含如下列: Date | Person 1 | Person 2 | Value 1 | Value 2 +----------------------------------------------+ 假设数据帧是从最旧到最新排序的 现在,我想迭代这个数据帧 对于每一行,我首先看第一个人。给这个手机号打电话 对于Person 1,我希望获取上一行最近的值 1,并执行复杂的计算 我当前获取最新值1(v1)的方式是: 算法: 迭代每一行。将当前行称为“当前行
Date | Person 1 | Person 2 | Value 1 | Value 2
+----------------------------------------------+
假设数据帧是从最旧到最新排序的
现在,我想迭代这个数据帧
- 对于每一行,我首先看第一个人。给这个手机号打电话
- 对于Person 1,我希望获取上一行最近的值 1,并执行复杂的计算
- 迭代每一行。将当前行称为“当前行”。所有计算将更新此“当前行”值1
- 获取人员1的Id(“人员1”列中的数字)。让我们以人19为例
- 在“人员1”或“人员2”列中查找人员19出现的最近的上一行“最近的上一行”
flag = 0
if person in 'Person 1' then flag = 1
new_value = most_recent_prev_row['value 1'] + flag * 0.5 * (most_recent_prev_row['value 2']
current_row['Value 1'] = new_value
例如,为第19个人更新上表中的第二行:
DATE Person 1 Person 2 value 1 value 2
13/08/2019 19 71 1000 1000
16/08/2019 19 68 1000+0.5*1000=1500 1000
如果第一行是:
DATE Person 1 Person 2 value 1 value 2
13/08/2019 71 19 1000 1000
16/08/2019 19 68 1000-0.5*1000=1500 1000
最后,我的计算代码如下。它是逐行应用的,速度非常慢:
# helper function to calculate new value
def calculate(value1, value2, flag):
new_value = value1 + flag * 0.5 * value2
# function to update value
def updateValue(playerId, date):
# default value if player has no wins or losses
score = 1000
# get win and losses for the player. Players in 'Person 1' won, players in 'Person 2' lost.
wins = df.loc[(df['Person 1'] == playerId) & (df.DATE < date)]
losses = df.loc[(df['Person 2'] == playerId) & (df.DATE < date)]
# player only has wins
if not wins.empty and losses.empty:
result_row = wins.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 1)
# player only has losses
if wins.empty and not losses.empty:
result_row = losses.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 0)
# player has wins and losses
if not wins.empty and not losses.empty:
p1_win_row = wins.iloc[-1]
p1_lost_row = losses.iloc[-1]
result_row = pd.DataFrame()
if p1_win_row.DATE < p1_lost_row.DATE:
result_row = losses.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 0)
else:
result_row = wins.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 1)
return score
#计算新值的辅助函数
def计算(值1、值2、标志):
新值=值1+标志*0.5*值2
#函数更新值
def updateValue(playerId,日期):
#如果玩家没有赢或输,则为默认值
分数=1000
#为玩家获得胜利和损失。“第一人”中的玩家获胜,“第二人”中的玩家失败。
wins=df.loc[(df['Person 1']==playerId)和(df.DATE
我想。哪里更快?或者自己运行基准测试?您可以添加一些示例数据和预期输出吗?您好,我添加了更多细节和示例。我基本上是在尝试运行递归计算(这就是为什么我称之为complex),但我不知道如何使用优化的函数。我成功应用这个慢功能的唯一方法是iTertuples
DATE Person 1 Person 2 value 1 value 2
13/08/2019 71 19 1000 1000
16/08/2019 19 68 1000-0.5*1000=1500 1000
# helper function to calculate new value
def calculate(value1, value2, flag):
new_value = value1 + flag * 0.5 * value2
# function to update value
def updateValue(playerId, date):
# default value if player has no wins or losses
score = 1000
# get win and losses for the player. Players in 'Person 1' won, players in 'Person 2' lost.
wins = df.loc[(df['Person 1'] == playerId) & (df.DATE < date)]
losses = df.loc[(df['Person 2'] == playerId) & (df.DATE < date)]
# player only has wins
if not wins.empty and losses.empty:
result_row = wins.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 1)
# player only has losses
if wins.empty and not losses.empty:
result_row = losses.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 0)
# player has wins and losses
if not wins.empty and not losses.empty:
p1_win_row = wins.iloc[-1]
p1_lost_row = losses.iloc[-1]
result_row = pd.DataFrame()
if p1_win_row.DATE < p1_lost_row.DATE:
result_row = losses.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 0)
else:
result_row = wins.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 1)
return score