R 基于现有列创建新的data.table列
MyR 基于现有列创建新的data.table列,r,data.table,R,Data.table,My数据。表包括每小时观察发动机产生的功率(输出)和系统状态描述符标签,该描述符指示发动机的所有部件都已打开 数据 structure(list(time = structure(c(1517245200, 1517247000, 1517248800, 1517250600, 1517252400, 1517254200, 1517256000, 1517257800, 1517259600, 1517261400, 1517263200, 1517265000, 1517266800,
数据。表
包括每小时观察发动机产生的功率(输出
)和系统状态描述符标签,该描述符指示发动机的所有部件都已打开
数据
structure(list(time = structure(c(1517245200, 1517247000, 1517248800,
1517250600, 1517252400, 1517254200, 1517256000, 1517257800, 1517259600,
1517261400, 1517263200, 1517265000, 1517266800, 1517268600, 1517270400,
1517272200, 1517274000, 1517275800, 1517277600, 1517279400, 1517281200,
1517283000, 1517284800, 1517286600), class = c("POSIXct", "POSIXt"
), tzone = ""), output1 = c(160.03310020928, 159.706274495615,
159.803834736236, 159.753928429527, 159.54807802046, 159.21298848298,
158.904290018581, 158.683643772917, 158.670475839199, 158.793901799427,
158.886487460894, 159.167829223303, 159.66751884913, 159.1288534448,
159.141463186901, 160.116892086363, 160.517879769862, 160.615925580417,
160.915687799509, 161.590897854561, 161.568455821241, 161.411642091721,
161.811137570257, 162.193040254917), tag1 = c("evap only", "evap only",
"fog & evap", "fog & evap", "evap only", "evap only", "evap only",
"neither fog nor evap", "neither fog nor evap", "fog & evap", "evap only", "evap only",
"evap only", "fog & evap", "evap only", "fog & evap", "evap only",
"evap only", "evap only", "evap only", "fog & evap", "fog & evap",
"bad data", "neither fog nor evap")), row.names = c(NA, -24L
), class = c("data.table", "data.frame"))
您还可以使用以下方法生成一些示例数据:
sample_data <- data.table(time = seq.POSIXt(from = Sys.time(), by = 60*60*3, length.out = 100),
output = runif(n = 100, min = 130, max = 172),
tag = sample(x = c('evap only', 'bad data', 'neither fog nor evap', 'fog and evap'),
size = 100, replace = T))
我尝试了下面的代码,但结果不是我想要的形式。我使用的是.SDcols
,因为实际数据集有大量其他列
sample_data[, lapply(.SD, function(z){mean(z, na.rm = T)}), .SDcols = c('output1'), by = .(round_date(time, 'day'), tag1)]
round_date tag1 output1
1: 2018-01-30 evap only 159.8391
2: 2018-01-30 fog & evap 160.0825
3: 2018-01-30 neither fog nor evap 159.8491
4: 2018-01-30 bad data 161.8111
我在stack overflow上看到了以下问题
是否有一种data.table
方法来实现这一点
library(dplyr)
library(lubridate)
# test is the dataframe provided in question
test1 = test %>% group_by(date = date(time), tag1) %>%
summarise(mean_power = mean(output1))
将上述代码生成的tibble
转换为dataframe
test1_df = data.frame(test1)
将数据重塑为宽格式
reshape(test1_df, idvar = "date", timevar =
"tag1", direction = "wide")
输出:
> output
date evap only fog & evap bad data neither fog nor evap
1 2018-01-29 159.8697 159.8038 NA NA
3 2018-01-30 159.8335 160.1289 161.8111 159.8491
自2018-01-30日期首次出现在
test1\u df
的第3行以来,行号显示为1后3行。以下是数据表方法
#explanation of mean(.SD[[1]] ..), see akrun's comment here:
# https://stackoverflow.com/questions/29568732/using-mean-with-sd-and-sdcols-in-data-table#comment47286876_29568732
ans <- DT[, .(mean_output1 = mean(.SD[[1]], na.rm = TRUE )),
by = .( date = as.Date( time ), tag1 ),
.SDcols = c("output1") ]
dcast( ans, date~tag1, value.var = "mean_output1" )
# date bad data evap only fog & evap neither fog nor evap
# 1: 2018-01-29 NA 159.3908 159.3701 158.6771
# 2: 2018-01-30 161.8111 160.5564 161.0323 162.1930
#平均值解释(.SD[[1]]),请参见此处的akrun评论:
# https://stackoverflow.com/questions/29568732/using-mean-with-sd-and-sdcols-in-data-table#comment47286876_29568732
ans r u正在寻找dcast(DT[,平均值(output1),(d=as.Date(time),tag1)],d~tag1,value.var=“V1”)
?因为你想要的输出只有一个日期,如果你已经有了日期的平均值,就很难说出你想要的是什么,这不是一个重塑的问题吗@来自附加日期的chinsoon12数据将作为输出中的附加行结束。我添加了一个部分来生成一些带有附加日期的随机数据。@RonakShah我意识到我遗漏了什么,它现在可以工作了,但我想知道是否有一种data.table方法来实现这一点。您希望样本数据集的输出是什么
#explanation of mean(.SD[[1]] ..), see akrun's comment here:
# https://stackoverflow.com/questions/29568732/using-mean-with-sd-and-sdcols-in-data-table#comment47286876_29568732
ans <- DT[, .(mean_output1 = mean(.SD[[1]], na.rm = TRUE )),
by = .( date = as.Date( time ), tag1 ),
.SDcols = c("output1") ]
dcast( ans, date~tag1, value.var = "mean_output1" )
# date bad data evap only fog & evap neither fog nor evap
# 1: 2018-01-29 NA 159.3908 159.3701 158.6771
# 2: 2018-01-30 161.8111 160.5564 161.0323 162.1930