Python 计算某一时间间隔内出现的一系列值
我已设置以下数据框以从csv导入:Python 计算某一时间间隔内出现的一系列值,python,pandas,dataframe,Python,Pandas,Dataframe,我已设置以下数据框以从csv导入: df = pd.read_csv('file_path', parse_dates={'timestamp': ['Date','Time']}, index_col='timestamp', usecols=['Date', 'Time', 'X'],) 因此,它的索引是datetime,值是int64对象“X” 我的数据如下所示,有两列:
df = pd.read_csv('file_path',
parse_dates={'timestamp': ['Date','Time']},
index_col='timestamp',
usecols=['Date', 'Time', 'X'],)
因此,它的索引是datetime,值是int64对象“X”
我的数据如下所示,有两列:
X
timestamp
2015-08-25 16:52:10 95
2015-08-25 16:52:12 84
2015-08-25 16:52:14 86
2015-08-25 16:52:16 84
2015-08-25 16:52:18 85
2015-08-25 16:52:20 86
2015-08-25 16:52:22 84
2015-08-25 16:52:24 95
2015-08-25 16:52:28 95
2015-08-25 16:52:48 80
2015-08-25 16:52:50 85
2015-08-25 16:52:52 85
2015-08-25 16:52:54 84
2015-08-25 16:52:56 85
2015-08-25 16:52:58 86
2015-08-25 16:53:00 85
2015-08-25 16:53:02 85
2015-08-25 16:53:04 85
2015-08-25 16:53:06 86
2015-08-25 16:53:08 85
2015-08-25 16:53:10 85
然而,时间间隔并不总是一致的。有时我的数据点之间的间隔超过两秒(即16:52:28-16:52:48)
我想要的值是X=[84,86],但前提是它们至少连续出现10秒。
因此,在我的数据帧中,我希望python只返回16:52:12-22和16:52:50-16:53:10的计数2
我如何告诉python不要将16:52:50-16:53:10计算为2?我可以为特定的时间间隔编写代码,但如何将“至少Y个连续秒”转换为python
提前谢谢
编辑:澄清一下,我的首选输出是一个样本集中事件Y发生多少次的计数。当X的值至少连续10秒时发生事件Y。例如,如果X连续10秒处于84-86,那么我希望这是一个1的计数。我不确定您到底想做什么,但我给您一个答案,至少有助于澄清预期
# Test data
df = pd.DataFrame([('2015-08-25 16:52:10', 95),
('2015-08-25 16:52:12', 84),
('2015-08-25 16:52:14', 86),
('2015-08-25 16:52:16', 84),
('2015-08-25 16:52:18', 85),
('2015-08-25 16:52:20', 86),
('2015-08-25 16:52:22', 84),
('2015-08-25 16:52:24', 95),
('2015-08-25 16:52:28', 95),
('2015-08-25 16:52:48', 80),
('2015-08-25 16:52:50', 85),
('2015-08-25 16:52:52', 85),
('2015-08-25 16:52:54', 84),
('2015-08-25 16:52:56', 85),
('2015-08-25 16:52:58', 86),
('2015-08-25 16:53:00', 85),
('2015-08-25 16:53:02', 85),
('2015-08-25 16:53:04', 85),
('2015-08-25 16:53:06', 86),
('2015-08-25 16:53:08', 85),
('2015-08-25 16:53:10', 85)],
columns=['timestamp', 'x'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
# Define a period column to indicate the period when the values occur
new = df.groupby(pd.TimeGrouper('10s'),as_index=False).apply(lambda x: x['x'])
df['period'] = new.index.get_level_values(0)
# Group by period and value and count the number of values to see the distinct values and how many time they occur by period
df = df.reset_index()
grouped = df.groupby(['period','x']).count()
print(grouped.head(10))
timestamp
period x
0 84 2
85 1
86 1
95 1
1 84 1
86 1
95 2
3 80 1
4 84 1
85 3
举个例子:
>>> df
timestamp x
0 2015-08-25 16:52:10 95
1 2015-08-25 16:52:12 84
2 2015-08-25 16:52:14 86
3 2015-08-25 16:52:16 84
4 2015-08-25 16:52:18 85
5 2015-08-25 16:52:20 86
6 2015-08-25 16:52:22 84
7 2015-08-25 16:52:24 95
8 2015-08-25 16:52:28 95
9 2015-08-25 16:52:48 80
10 2015-08-25 16:52:50 85
11 2015-08-25 16:52:52 85
12 2015-08-25 16:52:54 84
13 2015-08-25 16:52:56 85
14 2015-08-25 16:52:58 86
15 2015-08-25 16:53:00 85
16 2015-08-25 16:53:02 85
17 2015-08-25 16:53:04 85
18 2015-08-25 16:53:06 86
19 2015-08-25 16:53:08 85
20 2015-08-25 16:53:10 85
首先,让我们获得一个新列,其间隔为两个时间戳:
>>> tl=df['timestamp']
>>> df['interval']=[(tl[i+1]-tl[i]).total_seconds() for i, _ in enumerate(tl[:-1])]+[0]
>>> df
timestamp x interval
0 2015-08-25 16:52:10 95 2
1 2015-08-25 16:52:12 84 2
2 2015-08-25 16:52:14 86 2
3 2015-08-25 16:52:16 84 2
4 2015-08-25 16:52:18 85 2
5 2015-08-25 16:52:20 86 2
6 2015-08-25 16:52:22 84 2
7 2015-08-25 16:52:24 95 4
8 2015-08-25 16:52:28 95 20
9 2015-08-25 16:52:48 80 2
10 2015-08-25 16:52:50 85 2
11 2015-08-25 16:52:52 85 2
12 2015-08-25 16:52:54 84 2
13 2015-08-25 16:52:56 85 2
14 2015-08-25 16:52:58 86 2
15 2015-08-25 16:53:00 85 2
16 2015-08-25 16:53:02 85 2
17 2015-08-25 16:53:04 85 2
18 2015-08-25 16:53:06 86 2
19 2015-08-25 16:53:08 85 2
20 2015-08-25 16:53:10 85 0
现在,使用Python的groupby获得每个间隔跨度:
fmt='{} sec interval between {} and {} every {} seconds\n\tx={}, count={}\n'
for k, l in groupby(df.iterrows(), key=lambda row: row[1]['interval']):
li=list(l)
t2, t1=li[-1][1]['timestamp'], li[0][1]['timestamp']
ti=(t2-t1).total_seconds()
if ti>=10.0:
data=[e[1]['x'] for e in li]
print fmt.format(ti, t1, t2, k, data, Counter(data))
印刷品:
12.0 sec interval between 2015-08-25 16:52:10 and 2015-08-25 16:52:22 every 2.0 seconds
x=[95, 84, 86, 84, 85, 86, 84], count=Counter({84: 3, 86: 2, 85: 1, 95: 1})
20.0 sec interval between 2015-08-25 16:52:48 and 2015-08-25 16:53:08 every 2.0 seconds
x=[80, 85, 85, 84, 85, 86, 85, 85, 85, 86, 85], count=Counter({85: 7, 86: 2, 80: 1, 84: 1})
您能为这个例子提供所需的输出吗?谢谢你的意见,威尔·多索里,但有一点我还不清楚。你说的“84-86”是什么意思。如果我们假设X的值至少连续10秒保持不变,这很简单。但是你想检查它是否在10秒内保持在一个间隔内吗?@RomainX是的,就是这样。我想检查它是否至少连续10秒保持在84-86的范围内
在这种情况下,连续10秒表示连续10秒。
这是循环的。你是说至少10秒的时间段,采样率是每2秒一次,没有中断吗?谢谢你的回答,我知道这将帮助我评估我得到的结果。我添加了原来的帖子,希望能让我想要的结果更清楚。