Python loc比向后循环查找数据帧中的第一个匹配项快吗?

Python loc比向后循环查找数据帧中的第一个匹配项快吗?,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有一个大型数据框,其中包含如下列: Date | Person 1 | Person 2 | Value 1 | Value 2 +----------------------------------------------+ 假设数据帧是从最旧到最新排序的 现在,我想迭代这个数据帧 对于每一行,我首先看第一个人。给这个手机号打电话 对于Person 1,我希望获取上一行最近的值 1,并执行复杂的计算 我当前获取最新值1(v1)的方式是: 算法: 迭代每一行。将当前行称为“当前行

假设我有一个大型数据框,其中包含如下列:

Date | Person 1 | Person 2 | Value 1 |  Value 2
+----------------------------------------------+
假设数据帧是从最旧到最新排序的

现在,我想迭代这个数据帧

  • 对于每一行,我首先看第一个人。给这个手机号打电话

  • 对于Person 1,我希望获取上一行最近的值 1,并执行复杂的计算

我当前获取最新值1(v1)的方式是:

算法:

  • 迭代每一行。将当前行称为“当前行”。所有计算将更新此“当前行”值1
  • 获取人员1的Id(“人员1”列中的数字)。让我们以人19为例
  • 在“人员1”或“人员2”列中查找人员19出现的最近的上一行“最近的上一行”
执行以下计算:

flag = 0
if person in 'Person 1' then flag = 1
new_value = most_recent_prev_row['value 1'] + flag * 0.5 * (most_recent_prev_row['value 2']
current_row['Value 1'] = new_value
例如,为第19个人更新上表中的第二行:

DATE        Person 1    Person 2    value 1               value 2
13/08/2019  19          71          1000                  1000
16/08/2019  19          68          1000+0.5*1000=1500    1000
如果第一行是:

DATE        Person 1    Person 2    value 1               value 2
13/08/2019  71          19          1000                  1000
16/08/2019  19          68          1000-0.5*1000=1500    1000
最后,我的计算代码如下。它是逐行应用的,速度非常慢:

# helper function to calculate new value
def calculate(value1, value2, flag):
   new_value = value1 + flag * 0.5 * value2

# function to update value
def updateValue(playerId, date):        
    # default value if player has no wins or losses
    score = 1000

    # get win and losses for the player. Players in 'Person 1' won, players in 'Person 2' lost.
    wins = df.loc[(df['Person 1'] == playerId) & (df.DATE < date)]
    losses = df.loc[(df['Person 2'] == playerId) & (df.DATE < date)]

    # player only has wins
    if not wins.empty and losses.empty:
        result_row = wins.iloc[-1]
        score = calculate(result_row.value1, result_row.value2, 1)

    # player only has losses
    if wins.empty and not losses.empty:
        result_row = losses.iloc[-1]
        score = calculate(result_row.value1, result_row.value2, 0)

    # player has wins and losses
    if not wins.empty and not losses.empty:        
        p1_win_row = wins.iloc[-1]
        p1_lost_row = losses.iloc[-1]

        result_row = pd.DataFrame()

        if p1_win_row.DATE < p1_lost_row.DATE:
            result_row = losses.iloc[-1]
            score = calculate(result_row.value1, result_row.value2, 0)
        else:
            result_row = wins.iloc[-1]
            score = calculate(result_row.value1, result_row.value2, 1)

    return score
#计算新值的辅助函数
def计算(值1、值2、标志):
新值=值1+标志*0.5*值2
#函数更新值
def updateValue(playerId,日期):
#如果玩家没有赢或输,则为默认值
分数=1000
#为玩家获得胜利和损失。“第一人”中的玩家获胜,“第二人”中的玩家失败。
wins=df.loc[(df['Person 1']==playerId)和(df.DATE
我想。哪里更快?或者自己运行基准测试?您可以添加一些示例数据和预期输出吗?您好,我添加了更多细节和示例。我基本上是在尝试运行递归计算(这就是为什么我称之为complex),但我不知道如何使用优化的函数。我成功应用这个慢功能的唯一方法是iTertuples
DATE        Person 1    Person 2    value 1               value 2
13/08/2019  71          19          1000                  1000
16/08/2019  19          68          1000-0.5*1000=1500    1000
# helper function to calculate new value
def calculate(value1, value2, flag):
   new_value = value1 + flag * 0.5 * value2

# function to update value
def updateValue(playerId, date):        
    # default value if player has no wins or losses
    score = 1000

    # get win and losses for the player. Players in 'Person 1' won, players in 'Person 2' lost.
    wins = df.loc[(df['Person 1'] == playerId) & (df.DATE < date)]
    losses = df.loc[(df['Person 2'] == playerId) & (df.DATE < date)]

    # player only has wins
    if not wins.empty and losses.empty:
        result_row = wins.iloc[-1]
        score = calculate(result_row.value1, result_row.value2, 1)

    # player only has losses
    if wins.empty and not losses.empty:
        result_row = losses.iloc[-1]
        score = calculate(result_row.value1, result_row.value2, 0)

    # player has wins and losses
    if not wins.empty and not losses.empty:        
        p1_win_row = wins.iloc[-1]
        p1_lost_row = losses.iloc[-1]

        result_row = pd.DataFrame()

        if p1_win_row.DATE < p1_lost_row.DATE:
            result_row = losses.iloc[-1]
            score = calculate(result_row.value1, result_row.value2, 0)
        else:
            result_row = wins.iloc[-1]
            score = calculate(result_row.value1, result_row.value2, 1)

    return score