R 将基于NAs的列聚合到其他列中
我想基于组1中的NAs聚合组2:R 将基于NAs的列聚合到其他列中,r,aggregate,R,Aggregate,我想基于组1中的NAs聚合组2: Datetime group1 group2 2011-08-08 21:00:00 1 1 2011-08-08 21:10:00 NA 2 2011-08-08 21:20:00 NA 3 2011-08-08 21:30:00 2 4 2011-08-08 21:40:00 NA 5 2011-08-08 21:50:00 NA 6 2011-0
Datetime group1 group2
2011-08-08 21:00:00 1 1
2011-08-08 21:10:00 NA 2
2011-08-08 21:20:00 NA 3
2011-08-08 21:30:00 2 4
2011-08-08 21:40:00 NA 5
2011-08-08 21:50:00 NA 6
2011-08-08 22:00:00 3 7
这是我想要的输出:
Datetime group1 group2
2011-08-08 21:00:00 1 1
2011-08-08 21:30:00 2 9
2011-08-08 22:00:00 3 18
编辑:
9=2+3+4和18=5+6+7
aggregate(group2~group1, data=Data, subset(Data,group1==NA),sum)
如有任何建议,我们将不胜感激。我能用骨料做吗?或者我应该使用不同的软件包吗?它看起来像是
na。来自软件包zoo
的locf
在这里非常有用
假设dat
是您的原始数据,我们可以获取非NAgroup1
级别的日期,并使用cbind
将其与聚合的group2
数据结合在一起
> library(zoo)
> Datetime <- dat$Datetime[!is.na(dat$group1)]
> cbind(Datetime, aggregate(group2~group1, na.locf(dat, fromLast = TRUE), sum))
# Datetime group1 group2
# 1 2011-08-08 21:00:00 1 1
# 2 2011-08-08 21:30:00 2 9
# 3 2011-08-08 22:00:00 3 18
>图书馆(动物园)
>Datetime cbind(Datetime,聚合(group2~group1,na.locf(dat,fromLast=TRUE),sum))
#日期时间组1组2
# 1 2011-08-08 21:00:00 1 1
# 2 2011-08-08 21:30:00 2 9
# 3 2011-08-08 22:00:00 3 18
PS:感谢您使用
数据更新/编辑您的问题(+1)。表格
library(data.table)
DT1 <- DT[, group1:=cumsum(!is.na(c(0, group1[1:(.N-1)])))][,
list(Datetime=Datetime[.N],group2=sum(group2)), by=group1][,c(2,1,3), with=FALSE]
DT1
# Datetime group1 group2
#1: 2011-08-08 21:00:00 1 1
#2: 2011-08-08 21:30:00 2 9
#3: 2011-08-08 22:00:00 3 18
库(data.table)
DT1使用碱R的溶液:
ddf = structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "2011-08-08", class = "factor"),
time = structure(1:7, .Label = c("21:00:00", "21:10:00",
"21:20:00", "21:30:00", "21:40:00", "21:50:00", "22:00:00"
), class = "factor"), group1 = c(1L, NA, NA, 2L, NA, NA,
3L), group2 = 1:7), .Names = c("Date", "time", "group1",
"group2"), class = "data.frame", row.names = c(NA, -7L))
ddf$group1a = ddf$group1
for(i in nrow(ddf):1)
if(is.na(ddf$group1a[i]))
ddf$group1a[i] = ddf$group1a[i+1]
outdf = stack(with(ddf, tapply(group2, group1a, sum)))
names(outdf) = c("group2","group1")
outdf = outdf[,c(2,1)]
outdf
# group1 group2
#1 1 1
#2 2 9
#3 3 18
@理查德,是的。但是由于NA没有任何模式发生,我无法理解。此外,我不需要代码,只是任何建议。嗨,我按照上面的行,我收到以下错误消息<代码>错误(X[[1L]],…):参数的“类型”(字符)无效
na.locf(dat,fromLast=TRUE)
正在执行其任务。但是,aggregate(group2~group1,na.locf(dat,fromLast=TRUE),sum)
不起作用。有什么想法吗?确保group1
和group2
列是classnumeric
。检查sapply(dat,class)
我在下面使用了akrun的dat并创建了一个数据框,并确保group1和group2是数字的(以前它们是整数)。我还清理了R控制台并重新启动了R。不知何故,我仍然收到相同的错误消息。可能的原因是什么?谢谢你的帮助,我解决了na.locf(dat,fromLast=TRUE)
生成了group1和group2字符。因此,cbind(Datetime,aggregate(as.numeric(group2)~as.numeric(group1),na.locf(dat,fromLast=TRUE),sum))
就是解决方案。
ddf = structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "2011-08-08", class = "factor"),
time = structure(1:7, .Label = c("21:00:00", "21:10:00",
"21:20:00", "21:30:00", "21:40:00", "21:50:00", "22:00:00"
), class = "factor"), group1 = c(1L, NA, NA, 2L, NA, NA,
3L), group2 = 1:7), .Names = c("Date", "time", "group1",
"group2"), class = "data.frame", row.names = c(NA, -7L))
ddf$group1a = ddf$group1
for(i in nrow(ddf):1)
if(is.na(ddf$group1a[i]))
ddf$group1a[i] = ddf$group1a[i+1]
outdf = stack(with(ddf, tapply(group2, group1a, sum)))
names(outdf) = c("group2","group1")
outdf = outdf[,c(2,1)]
outdf
# group1 group2
#1 1 1
#2 2 9
#3 3 18