如何在计算R中的SD时省略聚合中的na

如何在计算R中的SD时省略聚合中的na,r,R,我有一个如下所示的数据帧: dat <- structure(list(cohort = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

我有一个如下所示的数据帧:

dat <- structure(list(cohort = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ADC8_AA", class = "factor"), 
    status = c(1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 
    1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, -9L, 1L, 1L, 2L, 
    2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 
    2L, 2L, 1L, 2L, -9L, 2L, 1L, -9L, 2L), age_onset = c(NA, 
    NA, NA, NA, 63, NA, 79, NA, 67, 71, 81, NA, NA, NA, NA, 73, 
    NA, 66, 77, 68, 75, NA, NA, NA, NA, 76, 79, NA, NA, NA, NA, 
    NA, 70, NA, 77, 84, 78, 76, NA, 92, 64, 60, 72, NA, 81, NA, 
    62, NA, 82, 74)), row.names = c(NA, 50L), class = "data.frame")
试试这个:

aggregate(age_onset~cohort+status, data = dat, sd, na.rm = TRUE)
#    cohort status age_onset
# 1 ADC8_AA     -9        NA
# 2 ADC8_AA      2  7.661191
您可以使用
aggregate
..
参数将
na.rm=TRUE
传递到
sd

对于只有一个非缺失值的任何组,您仍将获得
NA
。这是因为标准偏差不是为单个值定义的

subset(dat, status == -9)
#     cohort status age_onset
# 23 ADC8_AA     -9        NA
# 46 ADC8_AA     -9        NA
# 49 ADC8_AA     -9        82

sd(82)
# [1] NA
试试这个:

aggregate(age_onset~cohort+status, data = dat, sd, na.rm = TRUE)
#    cohort status age_onset
# 1 ADC8_AA     -9        NA
# 2 ADC8_AA      2  7.661191
您可以使用
aggregate
..
参数将
na.rm=TRUE
传递到
sd

对于只有一个非缺失值的任何组,您仍将获得
NA
。这是因为标准偏差不是为单个值定义的

subset(dat, status == -9)
#     cohort status age_onset
# 23 ADC8_AA     -9        NA
# 46 ADC8_AA     -9        NA
# 49 ADC8_AA     -9        82

sd(82)
# [1] NA

我们可以使用
dplyr

library(dplyr)
dat %>% 
    group_by(cohort, status) %>%
   summarise(Mean = mean(age_onset, na.rm = TRUE), 
             SD = sd(age_onset, na.rm = TRUE))

我们可以使用
dplyr

library(dplyr)
dat %>% 
    group_by(cohort, status) %>%
   summarise(Mean = mean(age_onset, na.rm = TRUE), 
             SD = sd(age_onset, na.rm = TRUE))

我仍然得到NA,这是因为只有一个非缺失值。请参见edits.Opps。这是有道理的。我仍然得到NA。这是因为只有一个非缺失值。请参见edits.Opps。这是有道理的。