在python中根据时间对两个日志数据数组进行排序
我想创建一个基于两个数据帧的新矩阵。第一个阵列df1每秒收集数据,第二个阵列df2每隔30分钟收集数据。理想情况下,来自df2的数据将添加到df1中,以表示正确的时间序列。在实践中,数据是完全不规则的,如果某些传感器被激活,数据会随机出现。例如:在python中根据时间对两个日志数据数组进行排序,python,pandas,numpy,Python,Pandas,Numpy,我想创建一个基于两个数据帧的新矩阵。第一个阵列df1每秒收集数据,第二个阵列df2每隔30分钟收集数据。理想情况下,来自df2的数据将添加到df1中,以表示正确的时间序列。在实践中,数据是完全不规则的,如果某些传感器被激活,数据会随机出现。例如: df1 = [['10-11', '14:21:01', '65'], ['10-11', '14:21:02', '55'], ['10-11', '14:21:03', '26'], ['12-11', '
df1 = [['10-11', '14:21:01', '65'],
['10-11', '14:21:02', '55'],
['10-11', '14:21:03', '26'],
['12-11', '17:29:58', '89'],
['12-11', '17:29:59', '12'],
['12-11', '17:30:00', '65'],
['12-11', '17:30:01', '3'],
['12-11', '17:30:02', '66'],
['12-11', '17:30:03', '971']]
df2 = [['10-11', '14:30', '9.9','112'],
['10-11', '15:00', '7.8','165'],
['12-11', '17:00', '6.1','154'],
['12-11', '17:30', '6.2','165'],
['12-11', '18:00', '6.5','170']]
我希望以一种方式对数据进行排序,例如,df1中数据在14:00:00-14:29:59之间的行将在每行中添加“9.9”和“112”的值,这与df2中的相关值相对应。其想法是,生成的数据帧将显示如下数组:
finaldf = [['10-11', '14:21:01', '65', '9.9','112'],
['10-11', '14:21:02', '55', '9.9','112'],
['10-11', '14:21:03', '26', '9.9','112'],
['12-11', '17:29:58', '89', '6.2','165'],
['12-11', '17:29:59', '12', '6.2','165'],
['12-11', '17:30:00', '65', '6.5','170'],
['12-11', '17:30:01', '3', '6.5','170'],
['12-11', '17:30:02', '66', '6.5','170'],
['12-11', '17:30:03', '971', '6.5','170']]
如果这让人觉得很复杂,我很抱歉,如果您能帮我解决这个问题或给我指出正确的方向,我将不胜感激。您可以在
df1
中创建新列,并通过迭代df2
中的行(对于大型数据帧可能非常慢)和使用datetime
过滤时间来填充它们。从你的例子
import pandas as pd
import datetime as dt
df1 = [['10-11', '14:21:01', '65'],
['10-11', '14:21:02', '55'],
['10-11', '14:21:03', '26'],
['12-11', '17:29:58', '89'],
['12-11', '17:29:59', '12'],
['12-11', '17:30:00', '65'],
['12-11', '17:30:01', '3'],
['12-11', '17:30:02', '66'],
['12-11', '17:30:03', '971']]
df2 = [['10-11', '14:30', '9.9','112'],
['10-11', '15:00', '7.8','165'],
['12-11', '17:00', '6.1','154'],
['12-11', '17:30', '6.2','165'],
['12-11', '18:00', '6.5','170']]
# convert to pandas DataFrame and name columns
df1 = pd.DataFrame(df1, columns=['date', 'time', 'val1'])
df2 = pd.DataFrame(df2, columns=['date', 'time', 'val2', 'val3'])
finaldf = df1
finaldf['val2'] = -1 # initialize to -1
finaldf['val3'] = -1 # initialize to -1
for i, d, t, v2, v3 in df2.itertuples():
# get the starting time by subtracting 30 minutes
tmin = (dt.datetime.strptime(t, '%H:%M') + dt.timedelta(minutes=-30)).time().strftime("%H:%M:%S")
tmax = t + ":00" # add seconds to end of string
# filter df1 by matching date and time range
index = (finaldf['date'] == d) & (finaldf['time'] >= tmin) & (finaldf['time'] < tmax)
finaldf.loc[index, 'val2'] = v2
finaldf.loc[index, 'val3'] = v3
请注意,在这段代码中,我将时间字符串转换为
datetime
,并调用time()
函数来获取时间。更好的方法可能是将整个日期和时间转换为datetime.datetime
,并将timedelta
应用于整个事件。(我无法从您的数据推断它是MM-DD还是DD-MM。)您可以在创建日期时间索引后使用pd.merge\u asof
:
df_1 = pd.DataFrame(df1)
df_2 = pd.DataFrame(df2)
df_1 = df_1.set_index(pd.to_datetime(df_1[0]+' ' +df_1[1],format='%m-%d %H:%M:%S'))
df_2 = df_2.set_index(pd.to_datetime(df_2[0]+ ' ' +df_2[1],format='%m-%d %H:%M'))
arr_out = pd.merge_asof(df_1, df_2,
right_index=True, left_index=True,
direction='forward', suffixes=('','_r'))\
.drop(['0_r','1_r'], 1).values.tolist()
arr_out
输出:
[['10-11', '14:21:01', '65', '9.9', '112'],
['10-11', '14:21:02', '55', '9.9', '112'],
['10-11', '14:21:03', '26', '9.9', '112'],
['12-11', '17:29:58', '89', '6.2', '165'],
['12-11', '17:29:59', '12', '6.2', '165'],
['12-11', '17:30:00', '65', '6.2', '165'],
['12-11', '17:30:01', '3', '6.5', '170'],
['12-11', '17:30:02', '66', '6.5', '170'],
['12-11', '17:30:03', '971', '6.5', '170']]
[['10-11', '14:21:01', '65', '9.9', '112'],
['10-11', '14:21:02', '55', '9.9', '112'],
['10-11', '14:21:03', '26', '9.9', '112'],
['12-11', '17:29:58', '89', '6.2', '165'],
['12-11', '17:29:59', '12', '6.2', '165'],
['12-11', '17:30:00', '65', '6.2', '165'],
['12-11', '17:30:01', '3', '6.5', '170'],
['12-11', '17:30:02', '66', '6.5', '170'],
['12-11', '17:30:03', '971', '6.5', '170']]