R 如何获得每n行的平均值并保留日期索引?
我有一个带有年份索引和val索引的数据框架 我想为val的每n行创建一个平均值,并保留相应的年份索引 基本上,输出是(对于n=2) 我该怎么做R 如何获得每n行的平均值并保留日期索引?,r,R,我有一个带有年份索引和val索引的数据框架 我想为val的每n行创建一个平均值,并保留相应的年份索引 基本上,输出是(对于n=2) 我该怎么做 structure(list(year = c(1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013), val = c
structure(list(year = c(1990, 1991, 1992, 1993, 1994, 1995, 1996,
1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
2008, 2009, 2010, 2011, 2012, 2013), val = c(84L, 67L, 72L, 138L,
111L, 100L, 221L, 108L, 204L, 125L, 82L, 157L, 175L, 252L, 261L,
185L, 146L, 183L, 245L, 172L, 98L, 216L, 89L, 144L)), .Names = c("year",
"val"), row.names = 13:36, class = "data.frame")
您可以使用
aggregate
,根据四舍五入的年份值进行分组:
setNames(aggregate(val~I(2*floor((year-min(year))/2)+min(year)), data=dat, mean),
c("year", "val"))
# year val
# 1 1990 75.5
# 2 1992 105.0
# 3 1994 105.5
# 4 1996 164.5
# 5 1998 164.5
# 6 2000 119.5
# 7 2002 213.5
# 8 2004 223.0
# 9 2006 164.5
# 10 2008 208.5
# 11 2010 157.0
# 12 2012 116.5
您可以使用
seq
和colMeans
函数
data.frame(Year = df[seq(1, length(df$year), 2), ]$year, Mean = colMeans(matrix(df$val, nrow=2)))
# Year Mean
# 1 1990 75.5
# 2 1992 105.0
# 3 1994 105.5
# 4 1996 164.5
# 5 1998 164.5
# 6 2000 119.5
# 7 2002 213.5
# 8 2004 223.0
# 9 2006 164.5
# 10 2008 208.5
# 11 2010 157.0
# 12 2012 116.5
dplyr
解决方案-添加分组变量(1,1,2,2,3,3等),然后计算组内val
的平均值,并在组内使用最小的年
,然后删除分组变量:
> require(dplyr)
> d %>% group_by(G=trunc(2:(n()+1)/2)) %>%
summarise(mean=mean(val),year=min(year)) %>%
select(-G)
Source: local data frame [12 x 2]
mean year
1 75.5 1990
2 105.0 1992
3 105.5 1994
4 164.5 1996
5 164.5 1998
6 119.5 2000
7 213.5 2002
8 223.0 2004
9 164.5 2006
10 208.5 2008
11 157.0 2010
12 116.5 2012
meanN =
function(df, n){
df %>% group_by(G=(0:(n()-1))%/%n) %>% summarise(mean=mean(val),year=min(year)) %>% select(-G)
}
> meanN(d, 2)
Source: local data table [12 x 2]
mean year
1 75.5 1990
2 105.0 1992
3 105.5 1994
4 164.5 1996
5 164.5 1998
6 119.5 2000
7 213.5 2002
8 223.0 2004
9 164.5 2006
10 208.5 2008
11 157.0 2010
12 116.5 2012
> meanN(d, 12)
Source: local data table [2 x 2]
mean year
1 122.4167 1990
2 180.5000 2002
归纳为n
的函数,并使用更简洁的方法计算分组变量:
> require(dplyr)
> d %>% group_by(G=trunc(2:(n()+1)/2)) %>%
summarise(mean=mean(val),year=min(year)) %>%
select(-G)
Source: local data frame [12 x 2]
mean year
1 75.5 1990
2 105.0 1992
3 105.5 1994
4 164.5 1996
5 164.5 1998
6 119.5 2000
7 213.5 2002
8 223.0 2004
9 164.5 2006
10 208.5 2008
11 157.0 2010
12 116.5 2012
meanN =
function(df, n){
df %>% group_by(G=(0:(n()-1))%/%n) %>% summarise(mean=mean(val),year=min(year)) %>% select(-G)
}
> meanN(d, 2)
Source: local data table [12 x 2]
mean year
1 75.5 1990
2 105.0 1992
3 105.5 1994
4 164.5 1996
5 164.5 1998
6 119.5 2000
7 213.5 2002
8 223.0 2004
9 164.5 2006
10 208.5 2008
11 157.0 2010
12 116.5 2012
> meanN(d, 12)
Source: local data table [2 x 2]
mean year
1 122.4167 1990
2 180.5000 2002
您可以使用
rep
创建分组变量:
n = 2
dd$group <- rep(1:(nrow(dd)/n), each = n)
n=2
dd$group一个包含数据的短单行解决方案。表
:
library(data.table)
setDT(df)[,.(val=mean(val)), year-0:1]
# year val
# 1: 1990 75.5
# 2: 1992 105.0
# 3: 1994 105.5
# 4: 1996 164.5
# 5: 1998 164.5
# 6: 2000 119.5
# 7: 2002 213.5
# 8: 2004 223.0
# 9: 2006 164.5
#10: 2008 208.5
#11: 2010 157.0
#12: 2012 116.5
使用zoo软件包中的rollapply
> library(zoo)
> res <- rollapply(df, width=2, by=2, FUN=mean)
> res[,1] <- floor(res[,1])
> res
year val
[1,] 1990 75.5
[2,] 1992 105.0
[3,] 1994 105.5
[4,] 1996 164.5
[5,] 1998 164.5
[6,] 2000 119.5
[7,] 2002 213.5
[8,] 2004 223.0
[9,] 2006 164.5
[10,] 2008 208.5
[11,] 2010 157.0
[12,] 2012 116.5
试试这一行:
> t(sapply(split(dat,rep(seq(1,nrow(dat),2),each=2)),colMeans))
year val
1 1990.5 75.5
3 1992.5 105.0
5 1994.5 105.5
7 1996.5 164.5
9 1998.5 164.5
11 2000.5 119.5
13 2002.5 213.5
15 2004.5 223.0
17 2006.5 164.5
19 2008.5 208.5
21 2010.5 157.0
23 2012.5 116.5
如果需要,您可以全年计算。对不起,我将删除它并再次发布。或者只是尝试调整您已经得到的问题的七个答案中的一个……我会注意使用年份值计算分组的答案,除非您确定年份值是一组完整的Y
到Y+N
。你的问题是要把每一行的n
相加,但不能保证n
行会连续几年n
(各位,这就是为什么我们要阅读规范…并编写测试…@Spacedman感谢您提供的详细信息。。还可以通过将0:1
切换为0:(n-1)来概括
。前提是data.frame是按年份排序的。(当然,可以在i
中添加order(year)
来解决这个问题;-)
> t(sapply(split(dat,rep(seq(1,nrow(dat),2),each=2)),colMeans))
year val
1 1990.5 75.5
3 1992.5 105.0
5 1994.5 105.5
7 1996.5 164.5
9 1998.5 164.5
11 2000.5 119.5
13 2002.5 213.5
15 2004.5 223.0
17 2006.5 164.5
19 2008.5 208.5
21 2010.5 157.0
23 2012.5 116.5