Python 计算时差，如果时差大于一小时，则标记为'；缺少'；，在该区域的直线图中绘制间隙_Python_Pandas_Plot_Time_Linegraph

Python 计算时差，如果时差大于一小时，则标记为'；缺少'；，在该区域的直线图中绘制间隙

python pandas plot time

Python 计算时差，如果时差大于一小时，则标记为'；缺少'；，在该区域的直线图中绘制间隙,python,pandas,plot,time,linegraph,Python,Pandas,Plot,Time,Linegraph,我有一个python的基本数据框架，它接收数据并绘制一个折线图。每个数据点都包含一个时间。如果数据文件运行良好，理想情况下每个时间戳彼此相差大约30分钟。在某些情况下，一小时内没有数据通过。在这些时候，我想将这个时间段标记为“缺失”，并绘制一个不连续的线图，清楚地显示数据缺失的位置我很难弄清楚如何做到这一点，甚至寻找解决方案，因为问题非常具体。数据是“实时”的，不断更新，因此我不能仅仅确定某个区域并进行编辑作为解决方法看起来像这样的东西：用于创建日期时间列的代码： #convert fi

我有一个python的基本数据框架，它接收数据并绘制一个折线图。每个数据点都包含一个时间。如果数据文件运行良好，理想情况下每个时间戳彼此相差大约30分钟。在某些情况下，一小时内没有数据通过。在这些时候，我想将这个时间段标记为“缺失”，并绘制一个不连续的线图，清楚地显示数据缺失的位置

我很难弄清楚如何做到这一点，甚至寻找解决方案，因为问题非常具体。数据是“实时”的，不断更新，因此我不能仅仅确定某个区域并进行编辑作为解决方法

看起来像这样的东西：

用于创建日期时间列的代码：

#convert first time columns into one datetime column
df['datetime'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute', 'second']])

我已经知道了如何计算时间差，这涉及到创建一个新列。以下是代码，以防万一：

df['timediff'] = (df['datetime']-df['datetime'].shift().fillna(pd.to_datetime("00:00:00", format="%H:%M:%S")))

dataframe的基本外观：

datetime               l1    l2    l3
2019-02-03 01:52:16   0.1   0.2   0.4
2019-02-03 02:29:26   0.1   0.3   0.6
2019-02-03 02:48:03   0.1   0.3   0.6
2019-02-03 04:48:52   0.3   0.8   1.4
2019-02-03 05:25:59   0.4   1.1   1.7
2019-02-03 05:44:34   0.4   1.3   2.2

我只是不知道如何去创造一个不连续的“现场”涉及时差的情节

提前感谢。

这并不是您想要的，但一个快速而优雅的解决方案是对数据进行重新采样

df = df.set_index('datetime')
df

# find samples which occurred more than an hour after the previous
# sample
holes = df.loc[td > timedelta(hours=1)]

# "holes" occur just before these samples
holes.index -= timedelta(microseconds=1)

# append holes to the data, set values to NaN
df = df.append(holes)
df.loc[holes.index] = np.nan

# plot series
df['l1'].plot(marker='*')

l1-l3
日期时间
2019-02-03 01:52:16  0.1  0.2  0.4
2019-02-03 02:29:26  0.1  0.3  0.6
2019-02-03 02:48:03  0.1  0.3  0.6
2019-02-03 04:48:52  0.3  0.8  1.4
2019-02-03 05:25:59  0.4  1.1  1.7
2019-02-03 05:44:34  0.4  1.3  2.2

如果绝对需要精确地绘制每个样本，可以在连续时间戳之间的差异超过某个阈值时分割数据，并分别绘制每个块

from datetime import timedelta

# get difference between consecutive timestamps
dt = df.index.to_series()
td = dt - dt.shift()

# generate a new group index every time the time difference exceeds
# an hour
gp = np.cumsum(td > timedelta(hours=1))

# get current axes, plot all groups on the same axes
ax = plt.gca()
for _, chunk in df.groupby(gp):
    chunk['l1'].plot(marker='*', ax=ax)

或者，您可以向数据中注入“漏洞”

df = df.set_index('datetime')
df

# find samples which occurred more than an hour after the previous
# sample
holes = df.loc[td > timedelta(hours=1)]

# "holes" occur just before these samples
holes.index -= timedelta(microseconds=1)

# append holes to the data, set values to NaN
df = df.append(holes)
df.loc[holes.index] = np.nan

# plot series
df['l1'].plot(marker='*')

编辑：@Igor Raush给出了一个更好的答案，但我还是不回答了，因为视觉效果有点不同

看看这是否对你有帮助：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Track the time delta in seconds
# I used total_seconds() and not seconds as seconds are limited to the amount of secs in one day
df['timediff'] = (df['datetime'] - df['datetime'].shift(1)).dt.total_seconds().cumsum().fillna(0)
# Create a dataframe of all the possible seconds in the time range
all_times_df = pd.DataFrame(np.arange(df['timediff'].min(), df['timediff'].max()), columns=['timediff']).set_index('timediff')
# Join the dataframes and fill nulls with 0s, so the values change only where data has been received
live_df = all_times_df.join(df.set_index('timediff')).ffill()
# Plot only your desired columns
live_df[['l1', 'l3']].plot()
plt.show()

使用我的新timediff列和df.loc函数解决

df['timediff'] = (df['datetime']-df['datetime'].shift().fillna(pd.to_datetime("00:00:00", format="%H:%M:%S")))

有了这个，我可以收集每行的时差

然后使用df.loc，我能够在l1和l2列中找到timediff大于一小时的值，然后生成一个nan。结果是，在那个时间点，图中缺少一条线，就像我想要的那样

missing_l1 = df['l1'].loc[df['timediff'] > timedelta(hours=1)] = np.nan
missing_l2 = df['l2'].loc[df['timediff'] > timedelta(hours=1)] = np.nan

首先，你有一个bug。

timediff

中的第一个值是

43497天01:52:16

。其次，

轴在图形中代表什么？显然，

代表时间点嗯，一开始我没有注意到这一点。不确定这是否重要，因为其余的都是正确的，我只关注时间？y轴只是l1，l3是百分比，这意味着我不确定您希望目标图显示什么。您希望所需图形中的

轴代表什么？它测量什么？这是l1+l2+l3的总和吗？我正在绘制两条线，l1和l3，数字是每个x线的y线（时间），我想我想做的是穿孔，但是我得到了一个关于df.set_index（'datetime'）（keyrerror:'datetime'）的关键错误。我确实使用set_index with datetime在代码的前面进行重采样，以提取一些平均值和最大值，所以这可能是问题所在，但我不确定。@user279955，如果您的数据帧已经按时间戳进行了索引，您可以跳过该步骤。打印每一步时，索引似乎是正确的，但这样做时会出现关键错误。当我没有得到这个错误时：TypeError:“>”在“float”和“datetime.timedelta”的实例之间不受支持，也就是说，我可以看到新的索引设置为每个时间差都是准确的。该错误会在df['.']的下一个实例中弹出，就像我转到plot时一样