R 计算事件之间的持续时间
我有以下数据:R 计算事件之间的持续时间,r,time,date-difference,R,Time,Date Difference,我有以下数据: DateTime | Var1 | Var2 | var3 | var4 | %Var1 | level ------------------------------------------------------------------- 11/15/2016 6:11 | 0 | 0.94 | 0.65 | 1.14 | 0 | (0,5] 11/15/2016 6:12 | 0.70 | 29.98 | 9.01
DateTime | Var1 | Var2 | var3 | var4 | %Var1 | level
-------------------------------------------------------------------
11/15/2016 6:11 | 0 | 0.94 | 0.65 | 1.14 | 0 | (0,5]
11/15/2016 6:12 | 0.70 | 29.98 | 9.01 | 30.01 | 0.53 | (0,5]
11/15/2016 6:13 | 35.08 | 152.23| 141.71| 103.7 | 26.57 | (5,30]
11/15/2016 6:14 | 69.05 | 137.97| 130.81| 101.54| 52.31 | (30,60]
11/15/2016 6:15 | 69.38 | 138.7 | 131.3 | 101.67| 52.56 | (30,60]
11/15/2016 6:19 | 80.63 | 140 | 134 | 126.45| 61.09 | (60,100]
11/15/2016 6:20 | 82.86 | 141.33| 136.09| 129.7 | 62.77 | (60,100]
11/15/2016 6:44 | 132.33| 206.18| 205.61| 205.64| 100.25| (100,500]
11/15/2016 6:45 | 128.75| 202.51| 197.69| 198.92| 97.53 | (60,100]
Datetime和Var1-Var4列出现在起始数据中
%Var1列是通过将Var1计算为预定义值的百分比来获得的。然后,将%var1列中的数据分解为不同的“级别”(由最后一列指示)。这些级别可能并不总是以有序的方式出现,即,(100500)后面可能紧跟着(5,30),以此类推 我必须计算每个不同级别的时间间隔。 因此,在标高(60100)上花费的总时间为6:19到6:44,从6:45到下一个数据点(表中未显示) 如何计算呢 我找到了这篇相关的文章;但是,行中包含了转换时间点的数据,而在我的例子中,我必须通过查看后续行数据来确定系统是继续在同一级别上运行还是正在进行转换 编辑:
我已经计算了连续实例之间的时间差,并将其作为一列添加到数据帧中
df <- data.frame(s$dateTime, s$Var1, s$Var2, s$Var3, s$Var4)
df$Var5 <- df$s.Var1 * 100/NumericConstant
fac <- cut(df$Var5, c(-10, 5, 30, 60, 100, 500))
df <- cbind(df,fac)
c_time <- as.POSIXlt(df$DateTime )
timedur <- as.numeric(difftime(c_time[2:length(c_time)] , c_time[1:(length(c_time)-1)], tz = 'UTC'))
timedur <- append(timedur,'NA') ## add 'NA' at end, since length(timedur) is 1 short of the DF
df <- cbind(df,timedur) ## add the time differences column to the dataframe
我想检查系统在变为(5,30)之前处于(0,5)状态的时间,然后是处于(5,30)状态的时间,然后是处于(30,60)状态的时间,依此类推。这里有一个解决方案,使用交叉连接并选择具有不同级别的第一行
library(dplyr)
nxt <- df %>% mutate(dummy = 1) %>%
inner_join(df %>%
select(level, DateTime) %>%
rename(DateTimeNext = DateTime, levelNext = level) %>%
mutate(dummy=1), by='dummy') %>%
# remove previous rows and the same level
filter(DateTime < DateTimeNext, level != levelNext) %>%
# group data to use in row_number()
group_by(DateTime) %>%
# select first row with different level
filter(row_number(DateTimeNext) == 1) %>%
select(DateTime, DateTimeNext)
df %>% left_join(nxt)
# filter out overlapping rows
df %>% left_join(nxt) %>% group_by(DateTimeNext) %>% filter(row_number(DateTime) == 1) %>%
mutate(timedur = DateTimeNext - DateTime)
生成数据:
str <- '
DateTime | Var1 | Var2 | var3 | var4 | %Var1 | level
11/15/2016 6:11 | 0 | 0.94 | 0.65 | 1.14 | 0 | (0,5]
11/15/2016 6:12 | 0.70 | 29.98 | 9.01 | 30.01 | 0.53 | (0,5]
11/15/2016 6:13 | 35.08 | 152.23| 141.71| 103.7 | 26.57 | (5,30]
11/15/2016 6:14 | 69.05 | 137.97| 130.81| 101.54| 52.31 | (30,60]
11/15/2016 6:15 | 69.38 | 138.7 | 131.3 | 101.67| 52.56 | (30,60]
11/15/2016 6:19 | 80.63 | 140 | 134 | 126.45| 61.09 | (60,100]
11/15/2016 6:20 | 82.86 | 141.33| 136.09| 129.7 | 62.77 | (60,100]
11/15/2016 6:44 | 132.33| 206.18| 205.61| 205.64| 100.25| (100,500]
11/15/2016 6:45 | 128.75| 202.51| 197.69| 198.92| 97.53 | (60,100]
'
file <- textConnection(str)
df <- read.table(file, sep = "|", header = T)
df$DateTime <- as.POSIXct(df$DateTime , format="%m/%d/%Y %H:%M")
str%
通过class='dummy')%>%
#删除以前的行和同一级别
过滤器(DateTime%
#要在第_行编号()中使用的组数据
分组依据(日期时间)%>%
#选择具有不同级别的第一行
过滤器(行号(DateTimeNext)==1)%>%
选择(DateTime,DateTimeNext)
df%>%左前联合(nxt)
#过滤掉重叠的行
df%%>%left_join(nxt)%%>%group_by(DateTimeNext)%%>%filter(行数(DateTime)=1)%%
突变(timedur=DateTimeNext-DateTime)
结果:
DateTime Var1 Var2 var3 var4 X.Var1 level DateTimeNext timedur
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dttm> <time>
1 2016-11-15 06:11:00 0.00 0.94 0.65 1.14 0.00 (0,5] 2016-11-15 06:13:00 2 mins
2 2016-11-15 06:13:00 35.08 152.23 141.71 103.70 26.57 (5,30] 2016-11-15 06:14:00 1 mins
3 2016-11-15 06:14:00 69.05 137.97 130.81 101.54 52.31 (30,60] 2016-11-15 06:19:00 5 mins
4 2016-11-15 06:19:00 80.63 140.00 134.00 126.45 61.09 (60,100] 2016-11-15 06:44:00 25 mins
5 2016-11-15 06:44:00 132.33 206.18 205.61 205.64 100.25 (100,500] 2016-11-15 06:45:00 1 mins
6 2016-11-15 06:45:00 128.75 202.51 197.69 198.92 97.53 (60,100] <NA> NA mins
DateTime Var1 Var2 var3 var4 X.Var1级别DateTimeNext timedur
1 2016-11-15 06:11:00 0.00 0.94 0.65 1.14 0.00(0,5)2016-11-15 06:13:00 2分钟
2016-11-15 06:13:00 35.08 152.23 141.71 103.70 26.57(5,30)2016-11-15 06:14:00 1分钟
3 2016-11-15 06:14:00 69.05 137.97 130.81 101.54 52.31(30,60)2016-11-15 06:19:00 5分钟
4 2016-11-15 06:19:00 80.63 140.00 134.00 126.45 61.09(60100)2016-11-15 06:44:00 25分钟
5 2016-11-15 06:44:00 132.33 206.18 205.61 205.64 100.25(100500)2016-11-15 06:45:00 1分钟
6 2016-11-15 06:45:00 128.75 202.51 197.69 198.92 97.53(60100)纳分钟
你能提供你自己尝试的代码吗?在DateTime列上运行'POSIXct'命令将值转换为'NA'。剩下的代码(没有POSIXct)给出如下错误:>nxt%mutate(dummy=1)%%>%+internal_join(df%%>%+…+select(DateTime,DateTimeNext)错误:无法分配大小为3.3 Gb>df%>%left_-join(nxt)的向量tbl_变量中的错误(y):未找到对象“nxt”>df%>%left_-join(nxt)%%>%group_by(DateTimeNext)%%>%filter(行数(DateTime)==1)tbl_变量中的错误(y):对象“nxt”未找到我想您的数据对于我的方法来说太大了。另一个选项是不使用连接
,而是使用应用
来收集正确的数据。
DateTime Var1 Var2 var3 var4 X.Var1 level DateTimeNext timedur
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <dttm> <time>
1 2016-11-15 06:11:00 0.00 0.94 0.65 1.14 0.00 (0,5] 2016-11-15 06:13:00 2 mins
2 2016-11-15 06:13:00 35.08 152.23 141.71 103.70 26.57 (5,30] 2016-11-15 06:14:00 1 mins
3 2016-11-15 06:14:00 69.05 137.97 130.81 101.54 52.31 (30,60] 2016-11-15 06:19:00 5 mins
4 2016-11-15 06:19:00 80.63 140.00 134.00 126.45 61.09 (60,100] 2016-11-15 06:44:00 25 mins
5 2016-11-15 06:44:00 132.33 206.18 205.61 205.64 100.25 (100,500] 2016-11-15 06:45:00 1 mins
6 2016-11-15 06:45:00 128.75 202.51 197.69 198.92 97.53 (60,100] <NA> NA mins