R 减去两个不同行中的数据

R 减去两个不同行中的数据,r,R,我从数据库中每隔一段时间收集数据。这些指标是计数器,就像在不断增加的数据中一样。要获得给定时间的度量值,必须从同一行的上一版本中减去一行 例如: TS INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT 2014-01-29 17:20:36 1 log file sync 1 756873 2014-01-29 17:20:36 1 log file sy

我从数据库中每隔一段时间收集数据。这些指标是计数器,就像在不断增加的数据中一样。要获得给定时间的度量值,必须从同一行的上一版本中减去一行

例如:

                 TS INST_ID         EVENT WAIT_TIME_MILLI WAIT_COUNT
2014-01-29 17:20:36       1 log file sync               1     756873
2014-01-29 17:20:36       1 log file sync               2      15627
2014-01-29 17:20:36       1 log file sync               4       2925
2014-01-29 17:21:03       1 log file sync               1     761063
2014-01-29 17:21:03       1 log file sync               2      15659
2014-01-29 17:21:03       1 log file sync               4       2929
期望输出:

                 TS INST_ID         EVENT WAIT_TIME_MILLI WAIT_COUNT
2014-01-29 17:21:03       1 log file sync               1       4190
2014-01-29 17:21:03       1 log file sync               2         32
2014-01-29 17:21:03       1 log file sync               4          4
TS是收集度量值的时间。INST_ID、EVENT和WAIT_TIME_MILLI是静态标识符。我想计算从一个TS到下一个TS的等待计数的增量

我对数据做了一些简化,但如果重要的话,会有很多事件,可以有多个INST_id

以下是测试数据框:

structure(list(TS = structure(c(1391034063.541, 1391034063.541, 
1391034063.541, 1391034036.136, 1391034036.136, 1391034036.136
), class = c("POSIXct", "POSIXt")), INST_ID = c(1, 1, 1, 1, 1, 
1), EVENT = c("log file sync", "log file sync", "log file sync", 
"log file sync", "log file sync", "log file sync"), WAIT_TIME_MILLI = c(1, 
2, 4, 1, 2, 4), WAIT_COUNT = c(761063, 15659, 2929, 756873, 15627, 
2925)), .Names = c("TS", "INST_ID", "EVENT", "WAIT_TIME_MILLI", 
"WAIT_COUNT"), class = "data.frame", row.names = c(NA, 6L))

@mlt的建议在
数据表中实施

library(data.table)
dt <- data.table(df, key="TS")               # `key` orders dt by TS ascending
dt[, 
  list(
    TS=tail(TS, -1L),                        # all but first
    WAIT_COUNT=diff(WAIT_COUNT)),            # differences in WAIT_COUNT
  by=list(INST_ID, EVENT, WAIT_TIME_MILLI)   # split by these fields
]
#    INST_ID         EVENT WAIT_TIME_MILLI                  TS WAIT_COUNT
# 1:       1 log file sync               1 2014-01-29 17:21:03       4190
# 2:       1 log file sync               2 2014-01-29 17:21:03         32
# 3:       1 log file sync               4 2014-01-29 17:21:03          4
库(data.table)

dt@mlt的建议在
数据中实施。表

library(data.table)
dt <- data.table(df, key="TS")               # `key` orders dt by TS ascending
dt[, 
  list(
    TS=tail(TS, -1L),                        # all but first
    WAIT_COUNT=diff(WAIT_COUNT)),            # differences in WAIT_COUNT
  by=list(INST_ID, EVENT, WAIT_TIME_MILLI)   # split by these fields
]
#    INST_ID         EVENT WAIT_TIME_MILLI                  TS WAIT_COUNT
# 1:       1 log file sync               1 2014-01-29 17:21:03       4190
# 2:       1 log file sync               2 2014-01-29 17:21:03         32
# 3:       1 log file sync               4 2014-01-29 17:21:03          4
库(data.table)

dt如果您的数据是一个名为dat的data.frame

library(dplyr)
dat <- arrange(dat, WAIT_TIME_MILLI, TS)
dat <- group_by(dat, WAIT_TIME_MILLI)
dat <- mutate(dat, diff = WAIT_COUNT - lag(WAIT_COUNT))
filter(dat, !is.na(diff))

如果您的数据是名为dat的data.frame

library(dplyr)
dat <- arrange(dat, WAIT_TIME_MILLI, TS)
dat <- group_by(dat, WAIT_TIME_MILLI)
dat <- mutate(dat, diff = WAIT_COUNT - lag(WAIT_COUNT))
filter(dat, !is.na(diff))

请参见
?diff
了解差异,参见
plyr
手册了解分区数据。帧处理请参见
?diff
了解差异,参见
plyr
手册了解分区数据。帧处理这很好;
data.table
/
dplyr
rosetta stone构建。杰出答案@Vincent!我实现了dplyr解决方案。我还更新了您的dplyr答案,并将事件添加到arrange and group_by。这一遗漏实际上是我的错,因为我过于简化了数据,只显示了一个事件。我正在捕获许多不同的事件,例如“日志文件同步”、“日志文件并行写入”、“db文件顺序读取”等等,这非常棒;
data.table
/
dplyr
rosetta stone构建。杰出答案@Vincent!我实现了dplyr解决方案。我还更新了您的dplyr答案,并将事件添加到arrange and group_by。这一遗漏实际上是我的错,因为我过于简化了数据,只显示了一个事件。我正在捕获许多不同的事件,例如“日志文件同步”、“日志文件并行写入”、“db文件顺序读取”等。同样感谢data.table解决方案。我可能应该让dput更明显。也感谢data.table解决方案。我可能应该让dput更加明显。