Python 如何根据datetime列查找每个id的第一个匹配项？_Python_Pandas

Python 如何根据datetime列查找每个id的第一个匹配项？

python pandas

Python 如何根据datetime列查找每个id的第一个匹配项？,python,pandas,Python,Pandas,我见过很多类似的问题，但没有找到我问题的答案。假设我有一个df： sample_id tested_at test_value 1 2020-07-21 5 1 2020-07-22 4 1 2020-07-23 6 2 2020-07-26 6

我见过很多类似的问题，但没有找到我问题的答案。假设我有一个df：

    sample_id     tested_at   test_value
            1    2020-07-21            5
            1    2020-07-22            4
            1    2020-07-23            6
            2    2020-07-26            6
            2    2020-07-28            5
            3    2020-07-22            4
            3    2020-07-27            4
            3    2020-07-30            6

df已按列中的

tested\u进行升序排序。现在我需要添加另一列first_test
，它将指示每行中每个sample_id
的第一个测试值，不管它是否最高。输出应为：
    sample_id     tested_at   test_value   first_test
            1    2020-07-21            5            5
            1    2020-07-22            4            5
            1    2020-07-23            6            5
            2    2020-07-26            6            6
            2    2020-07-28            5            6
            3    2020-07-22            4            4
            3    2020-07-27            4            4
            3    2020-07-30            6            4

df也相当大，因此更快的方法非常合适。
您可以使用pandas的groupby

按样本ID分组，然后使用

transform

方法获得每个样本ID的第一个值。请注意，这将按行号获取第一个值，而不是按日期获取第一个值，因此，请确保这些行是按日期排序的

df = pd.DataFrame(
    [
        [1, "2020-07-21", 5],
        [1, "2020-07-22", 4],
        [1, "2020-07-23", 6],
        [2, "2020-07-26", 6],
        [2, "2020-07-28", 5],
        [3, "2020-07-22", 4],
        [3, "2020-07-27", 4],
        [3, "2020-07-30", 6],
    ],
    columns=["sample_id", "tested_at", "test_value"],
)

df["first_test"] = df.groupby("sample_id")["test_value"].transform("first")

其结果是：

   sample_id   tested_at  test_value  first_test
0          1  2020-07-21    5           5
1          1  2020-07-22    4           5
2          1  2020-07-23    6           5
3          2  2020-07-26    6           6
4          2  2020-07-28    5           6
5          3  2020-07-22    4           4
6          3  2020-07-27    4           4
7          3  2020-07-30    6           4

使用

df[“first\u test”]=df.groupby（“sample\u id”）[“test\u value”]。转换（“first”）

。