Python 将时间序列调整为从0开始，而不是从时间戳开始进行比较_Python_Pandas_Time Series_Data Science_Data Analysis

Python 将时间序列调整为从0开始，而不是从时间戳开始进行比较

python pandas

Python 将时间序列调整为从0开始，而不是从时间戳开始进行比较,python,pandas,time-series,data-science,data-analysis,Python,Pandas,Time Series,Data Science,Data Analysis,我正在处理一个时间序列问题，我想匹配相似性。因为我正在工作的时间序列有很多间隙，我需要重新调整它，从t0-tn开始，而不是从实际的时间戳开始，以便它们对我的比较有用示例： P1:（2019年12月22日，5,0），（2019年12月24日，3,1），（2020年1月1日，2,0） P2：（05-01-2020,5,0）、（15-02-2020,4,1）、（03-03-2020,3,0）、（03-05-2020,5,1）、（05-06-2020,2,0）从0开始对齐： P1：（t0，5，0），

我正在处理一个时间序列问题，我想匹配相似性。因为我正在工作的时间序列有很多间隙，我需要重新调整它，从t0-tn开始，而不是从实际的时间戳开始，以便它们对我的比较有用

示例：
P1:（2019年12月22日，5,0），（2019年12月24日，3,1），（2020年1月1日，2,0）
P2：（05-01-2020,5,0）、（15-02-2020,4,1）、（03-03-2020,3,0）、（03-05-2020,5,1）、（05-06-2020,2,0）

从0开始对齐：
P1：（t0，5，0），（t1，3，1），（t2，2，0）
P2：（t0,5,0）、（t1,4,1）、（t2,3,0）、（t3,5,1）、（t4,2,0）

从0到tn对齐后，系列看起来有点类似。此外，它们具有不等长度和多变量序列

目前，我正在为每一组执行以下方法：

first_group_df["timestamp"] = pd.to_datetime(first_group_df["timestamp"]) # Create to datetime
first_group_df.sort_values(by="timestamp", inplace=True) # Sort it in the order of arrival
time_index = [i for i in range(0,len(first_group_df["timestamp"]))] # Index from 0 to number of datapoints
first_group_df["time_index"] = time_index #Add it as a column
first_group_df = first_group_df.set_index("time_index") #Make it index and then drop timestamp

是否有更好的方法将时间戳与整数索引对齐。我还认为排序后简单的重置索引（）可能会起作用。我正在寻找更好的方法

以下是其中一个ID的参考示例数据帧：

    pid  val  outcome            timestamp
0   112    5        1  22-12-2019 10:00:00
5   112    4        0  27-01-2020 11:00:00
10  112    2        1  29-01-2020 11:00:00
15  112    1        1  01-02-2020 10:00:00
20  112    5        1  01-03-2020 10:00:00

您可以使用：

您没有指定的是如何在同一时间对行进行排序。

rank

函数接受一个

方法

参数来处理关系。查看文档了解详细信息

df['t'] = df['timestamp'].rank()