Python 在三维数据帧中沿时间回归

Python 在三维数据帧中沿时间回归,python,pandas,regression,statsmodels,linearmodels,Python,Pandas,Regression,Statsmodels,Linearmodels,我有一个数据集,由随时间测量的五个信号组成。在给定的数据文件中,每个时间戳对应一个唯一的测量位置。位置在每个文件中重复,但时间间隔不规则。我想计算一段时间内每个位置信号的线性回归 现在,我已经将每个数据文件导入为熊猫数据帧,然后将其组装为三维数据帧,如下所示: peak1 = pd.read_csv('peak/scan1.txt', skiprows=[i for i in [0, 2]], index_col=False) peak2 = pd.read_csv('peak/scan2.tx

我有一个数据集,由随时间测量的五个信号组成。在给定的数据文件中,每个时间戳对应一个唯一的测量位置。位置在每个文件中重复,但时间间隔不规则。我想计算一段时间内每个位置信号的线性回归

现在,我已经将每个数据文件导入为熊猫数据帧,然后将其组装为三维数据帧,如下所示:

peak1 = pd.read_csv('peak/scan1.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak2 = pd.read_csv('peak/scan2.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak3 = pd.read_csv('peak/scan3.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak4 = pd.read_csv('peak/scan4.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak5 = pd.read_csv('peak/scan5.txt', skiprows=[i for i in [0, 2]], index_col=False)
peak6 = pd.read_csv('peak/scan6.txt', skiprows=[i for i in [0, 2]], index_col=False)

peaks = pd.concat([peak1, peak2, peak3, peak4, peak5, peak6], keys=('Scan1', 'Scan2', 'Scan3', 'Scan4', 'Scan5', 'Scan6'))
peaks['Start'] = pd.to_datetime(peaks['Start'], format='%H:%M:%S.%f')
peaks['End'] = pd.to_datetime(peaks['End'], format='%H:%M:%S.%f')
有没有一种简单的方法可以创建一个数组,其中包含每个测量位置相对于开始时间的回归斜率?我可以通过从每个文件中提取每个位置,为每个位置创建一组信号与时间的二维数组,计算回归,然后重新组合到一个新的数据帧中,并将其固定到原始数据帧上,但似乎有一种更有效的方法

编辑:示例

Scan1
IDTag,Index,Position,LastInteg,Start,End,Sig1,Sig2,Sig3,Sig4,Sig5,Sig2B,Sig3B,Sig4B,Sig5B

1,1,37.8450,False,02:13:59.893,02:14:00.106, 5, -0.0000183, 0.0000225, -0.0000168, 0.0000605, -0.0000183, 0.0000225, -0.0000168, 0.0000605, 
2,1,37.8448,False,02:14:00.174,02:14:00.387, 0, -0.0000124, 0.0000081, 0.0000095, -0.0000370, -0.0000124, 0.0000081, 0.0000095, -0.0000370, 
3,1,37.8446,False,02:14:00.439,02:14:00.652, 0, -0.0000079, 0.0000163, 0.0000214, -0.0000670, -0.0000079, 0.0000163, 0.0000214, -0.0000670, 
4,1,37.8444,False,02:14:00.704,02:14:00.918, 0, -0.0000313, -0.0000238, 0.0000211, 0.0000086, -0.0000313, -0.0000238, 0.0000211, 0.0000086, 
5,1,37.8442,False,02:14:00.969,02:14:01.182, 0, 0.0000376, -0.0000149, -0.0000246, -0.0000273, 0.0000376, -0.0000149, -0.0000246, -0.0000273, 
6,1,37.8440,False,02:14:01.234,02:14:01.448, 0, -0.0000171, 0.0000318, -0.0000517, -0.0000144, -0.0000171, 0.0000318, -0.0000517, -0.0000144, 
7,1,37.8438,False,02:14:01.500,02:14:01.713, 0, 0.0000494, -0.0000132, 0.0000169, 0.0000398, 0.0000494, -0.0000132, 0.0000169, 0.0000398, 
8,1,37.8436,False,02:14:01.765,02:14:01.978, 0, -0.0000162, 0.0000721, 0.0000450, -0.0000324, -0.0000162, 0.0000721, 0.0000450, -0.0000324, 
9,1,37.8434,False,02:14:02.030,02:14:02.242, 0, 0.0000210, 0.0000141, -0.0000450, -0.0000436, 0.0000210, 0.0000141, -0.0000450, -0.0000436, 
10,1,37.8432,False,02:14:02.295,02:14:02.508, 0, -0.0000420, -0.0000070, -0.0000197, -0.0000195, -0.0000420, -0.0000070, -0.0000197, -0.0000195, 

Scan2
IDTag,Index,Position,LastInteg,Start,End,Sig1,Sig2,Sig3,Sig4,Sig5,Sig2B,Sig3B,Sig4B,Sig5B

1,1,37.6950,False,02:19:25.980,02:19:26.192, 0, -0.0000127, 0.0000533, -0.0000101, -0.0000177, -0.0000127, 0.0000533, -0.0000101, -0.0000177, 
2,1,37.6952,False,02:19:26.245,02:19:26.460, 0, -0.0000500, -0.0000029, 0.0000109, -0.0000493, -0.0000500, -0.0000029, 0.0000109, -0.0000493, 
3,1,37.6954,False,02:19:26.511,02:19:26.723, 0, -0.0000545, -0.0000235, -0.0000488, 0.0000353, -0.0000545, -0.0000235, -0.0000488, 0.0000353, 
4,1,37.6956,False,02:19:26.776,02:19:26.989, 0, 0.0000221, -0.0000147, 0.0000139, 0.0000607, 0.0000221, -0.0000147, 0.0000139, 0.0000607, 
5,1,37.6958,False,02:19:27.041,02:19:27.254, 5, 0.0000016, -0.0000153, -0.0000305, 0.0000076, 0.0000016, -0.0000153, -0.0000305, 0.0000076, 
6,1,37.6960,False,02:19:27.306,02:19:27.518, 0, 0.0000076, 0.0000069, 0.0000244, 0.0000302, 0.0000076, 0.0000069, 0.0000244, 0.0000302, 
7,1,37.6962,False,02:19:27.571,02:19:27.784, 5, 0.0000141, 0.0000519, 0.0000095, -0.0000292, 0.0000141, 0.0000519, 0.0000095, -0.0000292, 
8,1,37.6964,False,02:19:27.837,02:19:28.051, 0, -0.0000167, -0.0000878, -0.0000292, 0.0000934, -0.0000167, -0.0000878, -0.0000292, 0.0000934, 
9,1,37.6966,False,02:19:28.102,02:19:28.316, 0, 0.0000353, 0.0000206, 0.0000289, -0.0000510, 0.0000353, 0.0000206, 0.0000289, -0.0000510, 
10,1,37.6968,False,02:19:28.367,02:19:28.581, 5, 0.0000103, 0.0000374, -0.0000351, -0.0000124, 0.0000103, 0.0000374, -0.0000351, -0.0000124, 

Scan3
IDTag,Index,Position,LastInteg,Start,End,Sig1,Sig2,Sig3,Sig4,Sig5,Sig2B,Sig3B,Sig4B,Sig5B

1,1,37.8450,False,02:23:06.767,02:23:06.979, 5, -0.0000075, 0.0000574, -0.0000014, 0.0000523, -0.0000075, 0.0000574, -0.0000014, 0.0000523, 
2,1,37.8448,False,02:23:07.048,02:23:07.261, 0, -0.0000019, 0.0000010, -0.0000090, -0.0000107, -0.0000019, 0.0000010, -0.0000090, -0.0000107, 
3,1,37.8446,False,02:23:07.313,02:23:07.526, 5, 0.0000316, 0.0000154, 0.0000086, -0.0000582, 0.0000316, 0.0000154, 0.0000086, -0.0000582, 
4,1,37.8444,False,02:23:07.579,02:23:07.791, 5, -0.0000320, 0.0000014, -0.0000194, 0.0000081, -0.0000320, 0.0000014, -0.0000194, 0.0000081, 
5,1,37.8442,False,02:23:07.844,02:23:08.057, 0, 0.0000227, -0.0000326, 0.0000124, -0.0000078, 0.0000227, -0.0000326, 0.0000124, -0.0000078, 
6,1,37.8440,False,02:23:08.109,02:23:08.321, 0, -0.0000037, -0.0000201, -0.0000247, -0.0000361, -0.0000037, -0.0000201, -0.0000247, -0.0000361, 
7,1,37.8438,False,02:23:08.374,02:23:08.587, 10, 0.0000048, -0.0000790, -0.0000260, 0.0000352, 0.0000048, -0.0000790, -0.0000260, 0.0000352, 
8,1,37.8436,False,02:23:08.639,02:23:08.853, 0, 0.0000499, 0.0000047, -0.0000064, -0.0000554, 0.0000499, 0.0000047, -0.0000064, -0.0000554, 
9,1,37.8434,False,02:23:08.905,02:23:09.117, 0, -0.0000475, -0.0000130, -0.0000116, 0.0000996, -0.0000475, -0.0000130, -0.0000116, 0.0000996, 
10,1,37.8432,False,02:23:09.170,02:23:09.384, 0, 0.0000206, -0.0000171, 0.0000280, 0.0000349, 0.0000206, -0.0000171, 0.0000280, 0.0000349, 
编辑2:下面是我想要的,但似乎我可以利用dataframe结构更轻松地完成这项工作

unique_positions = peaks['Position'].unique()
signal_list = ['Sig1', 'Sig2', 'Sig3', 'Sig4', 'Sig5', 'Sig2B', 'Sig3B', 'Sig4B', 'Sig5B']
regs = pd.DataFrame(columns=['Position'] + signal_list)
regs.set_index('Position', inplace=True)

for pos in unique_positions:
    time_series_at_pos = peaks[peaks['Position'] == pos]
    for sig in signal_list:
        linear_regressor = LinearRegression()
        linear_regressor.fit(time_series_at_pos['Start'].values.reshape(-1, 1), time_series_at_pos[sig].values.reshape(-1, 1))
        regs.ix[pos, sig] = linear_regressor.coef_[0][0]

请展示样本数据和您的长期回归尝试。目前还不清楚数据的外观以及在线性模型中需要运行哪些变量。我还没有做很长的路呢。我希望避免这样做。你的回归模型是什么(即因变量和自变量)?具体地说,就是每个测量位置与开始时间的回归。