以R表示的时间平均值
我每秒都要测量化合物的浓度。我想平均30秒和60秒。我一直在阅读这里的帖子,我尝试了以R表示的时间平均值,r,time,average,analysis,seconds,R,Time,Average,Analysis,Seconds,我每秒都要测量化合物的浓度。我想平均30秒和60秒。我一直在阅读这里的帖子,我尝试了lubridate和dplyr。但是没有运气。我正在努力使这项工作,但我一直未能做到。我正在从SAS过渡到R,请耐心等待 这是我的数据: head (data)#show the first 6 rows Date Time Temp Appb Bppb Cppb Dppb Eppb Fppb 1 10/30/17 21:32:33 25.23 -
lubridate
和dplyr
。但是没有运气。我正在努力使这项工作,但我一直未能做到。我正在从SAS过渡到R,请耐心等待
这是我的数据:
head (data)#show the first 6 rows
Date Time Temp Appb Bppb Cppb Dppb Eppb Fppb
1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340
2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230
3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607
4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383
5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002
6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064
那么,您可以执行以下操作:
data$time_bucket <-
as.POSIXct(round(as.numeric(as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%y %H:%M:%S"))/30)*30, origin='1970-01-01')
希望这能回答您的问题。这里是另一个带有
句点的解决方案。从xts
应用:
library(lubridate)
library(xts)
data_ts = as.xts(data[-c(1:2)], mdy_hms(paste(data$Date, data$Time)))
ep = endpoints(data_ts, 'seconds', k = 30)
period.apply(data_ts, ep, FUN = mean)
结果:
Temp Appb Bppb Cppb Dppb Eppb Fppb
2017-10-30 21:32:38 25.23333 -1.013958 21.58162 27.57642 -18.96497 41.3816 -1.153938
Temp Appb Bppb Cppb Dppb Eppb Fppb
2017-10-30 21:32:33 25.230 -0.4693040 22.44450 35.5993 -18.4843 52.04880 -2.9473400
2017-10-30 21:32:35 25.230 -1.0125065 21.49190 32.4128 -20.9199 45.02025 -1.5314185
2017-10-30 21:32:37 25.235 -0.8468505 21.26345 23.4880 -18.1059 37.22775 -0.3011925
2017-10-30 21:32:38 25.240 -1.8957300 21.53450 18.0576 -17.2539 31.74480 -0.3110640
data = read.table(text = " Date Time Temp Appb Bppb Cppb Dppb Eppb Fppb
1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340
2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230
3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607
4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383
5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002
6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064",
header = TRUE, stringsAsFactors = FALSE)
由于所有样本数据都在30秒内,因此每列只能得到一个平均值。要验证我的答案是否有效,您可以尝试2秒钟的平均值:
test_ep = endpoints(data_ts, 'seconds', k = 2)
period.apply(data_ts, test_ep, FUN = mean)
结果:
Temp Appb Bppb Cppb Dppb Eppb Fppb
2017-10-30 21:32:38 25.23333 -1.013958 21.58162 27.57642 -18.96497 41.3816 -1.153938
Temp Appb Bppb Cppb Dppb Eppb Fppb
2017-10-30 21:32:33 25.230 -0.4693040 22.44450 35.5993 -18.4843 52.04880 -2.9473400
2017-10-30 21:32:35 25.230 -1.0125065 21.49190 32.4128 -20.9199 45.02025 -1.5314185
2017-10-30 21:32:37 25.235 -0.8468505 21.26345 23.4880 -18.1059 37.22775 -0.3011925
2017-10-30 21:32:38 25.240 -1.8957300 21.53450 18.0576 -17.2539 31.74480 -0.3110640
data = read.table(text = " Date Time Temp Appb Bppb Cppb Dppb Eppb Fppb
1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340
2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230
3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607
4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383
5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002
6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064",
header = TRUE, stringsAsFactors = FALSE)
数据:
Temp Appb Bppb Cppb Dppb Eppb Fppb
2017-10-30 21:32:38 25.23333 -1.013958 21.58162 27.57642 -18.96497 41.3816 -1.153938
Temp Appb Bppb Cppb Dppb Eppb Fppb
2017-10-30 21:32:33 25.230 -0.4693040 22.44450 35.5993 -18.4843 52.04880 -2.9473400
2017-10-30 21:32:35 25.230 -1.0125065 21.49190 32.4128 -20.9199 45.02025 -1.5314185
2017-10-30 21:32:37 25.235 -0.8468505 21.26345 23.4880 -18.1059 37.22775 -0.3011925
2017-10-30 21:32:38 25.240 -1.8957300 21.53450 18.0576 -17.2539 31.74480 -0.3110640
data = read.table(text = " Date Time Temp Appb Bppb Cppb Dppb Eppb Fppb
1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340
2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230
3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607
4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383
5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002
6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064",
header = TRUE, stringsAsFactors = FALSE)
下面是一个data.table
和lubridate
完整性方法
library(data.table)
library(lubridate)
dat <- read.table(text = "Date Time Temp Appb Bppb Cppb Dppb Eppb Fppb
1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340
2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230
3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607
4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383
5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002
6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064 ",
header = T, stringsAsFactors = F)
#convert to R date object
dat$tme <- as.POSIXct(strptime(paste(dat$Date, dat$Time), format = "%m/%d/%y %H:%M:%S"), tz = "America/Montreal")
#convert to data.table
dat <- as.data.table(dat)
#drop Date and Time since we have an R date object now
dat <- dat[,-c(1,2)]
#result
dat[, lapply(.SD, mean), .(tme = round_date(tme, "3 seconds"))]
我个人更喜欢data.table
方法,尤其是对于较大的数据集,因为它速度快,而且子集和执行操作非常方便 当你输入数据时,它是好的。我用的是你的样本数据,这就是为什么它是这样写的。根据数据格式,您可以将原始数据直接读入R。例如,如果您有一个.csv/.txt文件,您可以使用data.table包中的fread(),速度非常快。有一个用于MS excel文件的xlsx包中的read_xlsx。如果您的数据在线,您也可以通过API调用读取数据。谢谢。这就是我一直在做的。谢谢。除了数据部分之外,我实际上已经让它工作了。我的数据集包含6500行。我不明白是否必须编写每个数据点。