Dplyr在没有所有数据的情况下计算均值和方差
我有一个如下所示的数据集:Dplyr在没有所有数据的情况下计算均值和方差,r,dplyr,R,Dplyr,我有一个如下所示的数据集: set.seed(50) n <- 20 s_num <- c(10,20,30) counts <- c(0,1,2,3,4) strata <- sample(s_num, n, replace=T) sites <- seq(1, n, by=1) observed <- sample(counts, n, replace=T) df <- as.data.frame(cbind(strata,sites,obser
set.seed(50)
n <- 20
s_num <- c(10,20,30)
counts <- c(0,1,2,3,4)
strata <- sample(s_num, n, replace=T)
sites <- seq(1, n, by=1)
observed <- sample(counts, n, replace=T)
df <- as.data.frame(cbind(strata,sites,observed))
set.seed(50)
n我们可以为子集创建一个逻辑条件
df %>%
mutate(ind = observed != 0) %>%
group_by(strata) %>%
summarise(mcount = mean(observed[ind]), varcount = var(observed[ind]))
# A tibble: 3 x 3
# strata mcount varcount
# <dbl> <dbl> <dbl>
#1 10 1.89 0.861
#2 20 1.6 0.8
#3 30 3 0.667
df%>%
突变(ind=观察到的!=0)%>%
组别(阶层)%>%
总结(mcount=平均值(观察[ind]),varcount=var(观察[ind]))
#一个tibble:3x3
#地层mcount varcount
#
#1 10 1.89 0.861
#2 20 1.6 0.8
#3 30 3 0.667
注意:不建议使用as.data.frame(cbind
,因为cbind
可以将其转换为matrix
(矩阵只能容纳一个类),这将导致所有列factor
或character
与as.data.frame
(如果有任何字符列)一起使用data.frame(地层、场地、观测)
一旦计算了计数图
,您就可以从公式中手动计算平均值和方差
方差计算为sum((x-均值(x))^2)/(长度(x)-1)
您可以将过滤器添加到管道中
df2 <- df %>%
filter(observed != 0) %>%
group_by(strata) %>%
summarise(mcount = mean(observed),
varcount = var(observed))
df2%
过滤器(观察到的!=0)%>%
组别(阶层)%>%
总结(mcount=平均值(观察值),
varcount=var(观察值))
这样,您就不需要创建中间数据帧。这一个更优雅。据我所知,问题是如何在不使用df的情况下计算均值和方差。
是的,对不起,我不清楚。在这种情况下,我没有原始的“df”。太好了,谢谢。我不确定sum(())对于每一行都有效,但这是有效的。
df4 <- df3 %>%
group_by(strata) %>%
summarise(mcount = mean(observed),
varcount = var(observed))
site_count <- df %>%
group_by(strata) %>%
summarise(count_plot = n_distinct(sites))
df %>%
mutate(ind = observed != 0) %>%
group_by(strata) %>%
summarise(mcount = mean(observed[ind]), varcount = var(observed[ind]))
# A tibble: 3 x 3
# strata mcount varcount
# <dbl> <dbl> <dbl>
#1 10 1.89 0.861
#2 20 1.6 0.8
#3 30 3 0.667
df3 %>%
left_join(site_count) %>%
group_by(strata) %>%
summarise(N = unique(count_plot),
mcount = sum(observed)/N,
varcount = sum((observed - mcount)^2, (N - n())*mcount^2)/(N - 1)) %>%
select(-N)
# # A tibble: 3 x 3
# strata mcount varcount
# <dbl> <dbl> <dbl>
# 1 10.0 1.89 0.861
# 2 20.0 1.33 1.07
# 3 30.0 2.40 2.30
df2
# A tibble: 3 x 3
strata mcount varcount
<dbl> <dbl> <dbl>
1 10.0 1.89 0.861
2 20.0 1.33 1.07
3 30.0 2.40 2.30
df2 <- df %>%
filter(observed != 0) %>%
group_by(strata) %>%
summarise(mcount = mean(observed),
varcount = var(observed))