Python 用日期时间数据绘制scipy.signal.find_峰值图_Python_Pandas_Matplotlib_Plot_Scipy

Python 用日期时间数据绘制scipy.signal.find_峰值图

python pandas matplotlib plot

Python 用日期时间数据绘制scipy.signal.find_峰值图,python,pandas,matplotlib,plot,scipy,Python,Pandas,Matplotlib,Plot,Scipy,我想使用scipy.signal.find_peaks在df中查找值的峰值，如下所示 df: 可复制示例： from pandas import Timestamp df = pd.DataFrame({'index': {0: 36, 1: 47, 2: 67, 3: 129, 4: 176, 5: 246, 6: 281, 7: 335, 8: 370, 9: 375, 10: 384, 11: 408, 12: 428, 13: 437,

我想使用

scipy.signal.find_peaks

在

df

中查找

值的峰值，如下所示
df:
可复制示例：
from pandas import Timestamp
df = pd.DataFrame({'index': {0: 36,
  1: 47,
  2: 67,
  3: 129,
  4: 176,
  5: 246,
  6: 281,
  7: 335,
  8: 370,
  9: 375,
  10: 384,
  11: 408,
  12: 428,
  13: 437,
  14: 482,
  15: 500,
  16: 528,
  17: 585,
  18: 641,
  19: 647},
 'Timestamp': {0: Timestamp('2020-11-08 23:30:40.370000'),
  1: Timestamp('2020-11-13 04:52:29.410000'),
  2: Timestamp('2020-12-01 22:17:50.300000'),
  3: Timestamp('2020-11-24 00:57:11.950000'),
  4: Timestamp('2020-12-03 01:40:16.250000'),
  5: Timestamp('2020-11-12 07:32:54'),
  6: Timestamp('2020-11-30 21:13:07.630000'),
  7: Timestamp('2020-11-30 20:43:11.050000'),
  8: Timestamp('2020-11-09 06:04:19.630000'),
  9: Timestamp('2020-11-22 21:21:33.150000'),
  10: Timestamp('2020-11-23 22:04:44.580000'),
  11: Timestamp('2020-11-16 03:26:10.150000'),
  12: Timestamp('2020-11-07 02:04:42.890000'),
  13: Timestamp('2020-11-26 00:10:34.660000'),
  14: Timestamp('2020-11-26 04:14:23.180000'),
  15: Timestamp('2020-12-06 19:40:30.580000'),
  16: Timestamp('2020-12-26 02:17:27.110000'),
  17: Timestamp('2020-11-25 18:13:17.450000'),
  18: Timestamp('2020-11-26 20:02:13.170000'),
  19: Timestamp('2020-11-11 21:36:09.530000')},
 'Value': {0: 45.5,
  1: 44.5,
  2: 42.5,
  3: 43.0,
  4: 42.0,
  5: 43.5,
  6: 45.5,
  7: 43.5,
  8: 45.0,
  9: 44.0,
  10: 40.5,
  11: 46.0,
  12: 46.5,
  13: 47.0,
  14: 46.0,
  15: 46.0,
  16: 47.5,
  17: 43.0,
  18: 46.0,
  19: 41.0},
 'Id': {0: 15,
  1: 15,
  2: 20,
  3: 103,
  4: 87,
  5: 103,
  6: 15,
  7: 15,
  8: 15,
  9: 115,
  10: 20,
  11: 15,
  12: 15,
  13: 15,
  14: 15,
  15: 15,
  16: 15,
  17: 15,
  18: 15,
  19: 112}})

使用以下代码：
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks

x = df['Value'].values
peaks, properties = find_peaks(x, prominence=0.1, width=1)
properties["prominences"], properties["widths"]

plt.figure(figsize=(15,12))
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.vlines(x=peaks, ymin=x[peaks] - properties["prominences"],
           ymax = x[peaks], color = "C1")
plt.hlines(y=properties["width_heights"], xmin=properties["left_ips"],
           xmax=properties["right_ips"], color = "C1")
plt.show()

输出如下，仅考虑值列。

如何使时间戳
成为水平轴

编辑：
我尝试将时间戳
作为索引，并相应地更改了x轴和y轴：

import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks

z = df
z.set_index('Timestamp', inplace=True)
z.index.to_pydatetime()
peaks, properties = find_peaks(z.Value, prominence=0.1, width=1)
properties["prominences"], properties["widths"]

plt.figure(figsize=(15,12))
plt.plot_date(z.index, z.Value)
plt.plot_date(z.index[peaks], z.Value[peaks], "x")
plt.vlines(x=z.index[peaks], ymin=z.Value[peaks] - properties["prominences"],
           ymax = z.index[peaks], color = "C1")
plt.hlines(y=properties["width_heights"], xmin=properties["left_ips"],
           xmax=properties["right_ips"], color = "C1")
plt.show()

它返回：

可能出了什么问题

编辑2：
在一个更大的数据集上使用@Asmus的解决方案，我注意到当我改变突出度
和宽度
时，图形完全改变了。例如，在下面的图表中，我对Value>30
使用了突出度==5
和宽度==0.0001157
，因为我对值
在30以上的峰值感兴趣，突出度大约为5，宽度为0.0001157，这是一天的一小部分，即10秒

然后，如果我将突出度更改为10，则如下所示：

两者看起来都与原始数据非常不同，如下所示：

为什么会发生这种情况？关于查找峰值（）和索引：
好的，如果我们看一下，我们会看到
采用1-D数组，通过简单比较相邻值找到所有局部最大值
返回
x中满足所有给定条件的峰值指数
例如，跑步：
import numpy as np
x = np.array([4,5,6,7,6,5,5])
idx, properties = find_peaks(x)
print(idx, x[idx])

产生：[3]
（索引）和[7]
作为值

关于订购数据：
在您的情况下，您正在尝试将数据作为日期的函数进行拟合，即，我们首先需要确保您的数据顺序正确-如果您运行以下命令：
x = df['Timestamp'].values
y = df['Value'].values
idx, properties = find_peaks(y, prominence=0.1, width=1)

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(10,3))

# that is your original plot:
axes[0].plot(y)
axes[0].plot(idx,y[idx],"x")
axes[0].set_title("unsorted, x = indices")

# here, I simply use the "correct" data as x-axis
axes[1].plot(x,y)
axes[1].plot(x[idx], y[idx], "x")
axes[1].set_title("unsorted, x = dates")

# and now I also sort the data:
df = df.sort_values(by="Timestamp")
x = df['Timestamp'].values
y = df['Value'].values
idx, properties = find_peaks(y, prominence=0.1, width=1)
axes[2].plot(x,y)
axes[2].plot(x[idx], y[idx], "x")
axes[2].set_title("sorted, x = dates")

# some nicer formatting:
for ax in axes:
    ax.grid()
fig.autofmt_xdate()
plt.tight_layout()
plt.show()

您将看到：
即（从左到右）：
绘制时的数据，作为索引的函数（即x从0到19）。在这里，您可以轻松找到峰值并突出显示它们
作为函数x=df['Timestamp']
绘制的数据-它看起来很混乱，因为您的数据帧没有按时间排序
已排序的数据帧，作为时间戳的函数绘制，使用x[idx]，y[idx]
突出显示峰值位置

关于日期轴上的hline和vline
现在，您应该能够添加垂直线，而不会出现以下问题：
axes[0].vlines(x=x[idx], ymin=y[idx] - properties["prominences"],
           ymax = y[idx], color = "C1")

但在水平线的情况下，问题是属性
如下所示：
{
    'prominences': array([5., 5.]), 
    'left_bases': array([3, 8]), 
    'right_bases': array([ 8, 17]), 
    'widths': array([3.14285714, 3.225]), 
    'width_heights': array([43.5, 44.5]), 
    'left_ips': array([ 4., 10.375]), 
    'right_ips': array([ 7.14285714, 13.6])
}

            Timestamp  Value
0 2020-01-01 00:00:00    0.0
1 2020-02-01 00:00:00    1.0
2 2020-02-02 00:00:00    4.0 # <— clearly a peak here at index [2]
3 2020-02-03 00:00:00    3.0
4 2020-02-03 12:45:00    2.7
5 2020-03-01 00:00:00    2.0
6 2020-04-01 00:00:00    1.0

对于matplotlib
而言，显然“不清楚”例如3.14285714
的width
在日期方面的含义，至少在没有正确转换为日期的情况下

编辑：如何使用缺失的数据修复hlines
首先，您需要确保日期范围内的所有日期都有有效数据，这样您就可以直接将find_peaks（）
中的返回值解释为相对日期（也就是说，如果它在索引“2”处找到峰值，您就可以直接将其转换为[开始日期+2天]）
这里，数据框如下所示：
{
    'prominences': array([5., 5.]), 
    'left_bases': array([3, 8]), 
    'right_bases': array([ 8, 17]), 
    'widths': array([3.14285714, 3.225]), 
    'width_heights': array([43.5, 44.5]), 
    'left_ips': array([ 4., 10.375]), 
    'right_ips': array([ 7.14285714, 13.6])
}

            Timestamp  Value
0 2020-01-01 00:00:00    0.0
1 2020-02-01 00:00:00    1.0
2 2020-02-02 00:00:00    4.0 # <— clearly a peak here at index [2]
3 2020-02-03 00:00:00    3.0
4 2020-02-03 12:45:00    2.7
5 2020-03-01 00:00:00    2.0
6 2020-04-01 00:00:00    1.0

您的df
不是按时间戳排序的，因此您目前发现的峰值只能在“索引空间”中有效。否则，您应该能够通过df.loc[index，'Timestamp']
将找到的索引转换为时间戳，并在正确的轴上绘制所有内容：plt.plot（df['Timestamp']，x）
，等等。@Asmus我可以问一下索引是什么吗？我是否必须重置时间戳作为索引？请你给我看一些代码好吗？我又附加了我的答案，解释了为什么你需要重新取样。也许在您的特定情况下，您可以尝试使用更精细的插值参数，如'min'
，请参阅下面的更新。嗨，阿斯莫斯，谢谢您的精彩回答。我可以知道为什么我们需要重新取样吗？我们能用原始数据吗？我已经更新了问题的更多细节。@nilsinelabore您似乎误解了find_peaks（）
的工作原理：您实际上只是将数组df[“Value”]
作为输入（+参数），它完全不知道x轴！无论您选择的x轴是“日期时间”（即2020-12-01
）还是索引（即[0,1,2]）或其他什么，它都只会尝试在给定的y值内找到峰值！
            Timestamp  Value
0 2020-01-01 00:00:00    0.0
1 2020-02-01 00:00:00    1.0
2 2020-02-02 00:00:00    4.0 # <— clearly a peak here at index [2]
3 2020-02-03 00:00:00    3.0
4 2020-02-03 12:45:00    2.7
5 2020-03-01 00:00:00    2.0
6 2020-04-01 00:00:00    1.0

df = df.resample('min').mean().reset_index()

# and, within def to_date(x):
return pd.to_datetime(_start) + pd.to_timedelta(x, unit='min')