如何总结和计算R组中的非缺失、非零和非唯一值？_R_Dplyr

如何总结和计算R组中的非缺失、非零和非唯一值？

如何总结和计算R组中的非缺失、非零和非唯一值？,r,dplyr,R,Dplyr,我有以下数据集： df1 <- structure(list(group_id = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,

我有以下数据集：

df1 <- structure(list(group_id = c(3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 16, 16, 16, 16, 16, 16, 
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 
16, 16, 16, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 
29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 
29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29), 
    score = c(35, 0, 37.5, 51.9, 43, 41, 36.9, 44.4, 27.5, 41.5, 
    60, 39.4, 39.5, 50, 55, 57.8, 44.7, 60.2, 40.4, 62.5, 61.1, 
    53.9, 67.2, 43.9, 37.6, 58.4, 34.1, 56.4, 41.5, 54.4, 50.3, 
    36.8, 41.4, 37.2, 51.3, 50.7, 75.4, 62.9, NA, 54.5, 53.9, 
    59.5, 24.5, 22.7, 53, 35.8, 28, 39.4, 44.5, NA, NA, 55.9, 
    52.5, 36, 43.5, 42.9, 25.5, 35, 46, NA, 60.2, 65.6, 30.5, 
    37.1, 49.1, 70.4, 34.1, 45.4, 30.8, 38.6, 28.7, 39.8, 38.5, 
    0, 72.6, 0, NA, 54.6, 0, 69.8, 31.6, 55.9, 47.3, 34.3, 0, 
    40.8, 69.7, 61.5, 48.6, 59.3, 0, 67.2, 52, 57, 0, NA, 0, 
    51.7, 47.1, 0)), row.names = c(NA, -100L), groups = structure(list(
    .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
        10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
        21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 
        32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 
        43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 
        54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 
        65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 
        76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 
        87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 
        98L, 99L, 100L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -100L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl", 
"data.frame"))

df2 <- structure(list(group_id = c(3L, 10L, 16L, 26L, 27L, 29L), score = c(43.04, 
49.56, 44.86, 49.05, 32.28, 54.18), n_individuals = c(14L, 20L, 
21L, 8L, 5L, 17L)), class = "data.frame", row.names = c(NA, -6L
))

library(dplyr)
df2 <- df1 %>%
   mutate(score = case_when( 
      score == 0 ~ NA_real_,                                #assign missing values to zeros
      TRUE ~ score)) %>%                             
   group_by(group_id) %>%                                   #group by group_id
   summarise(score = mean(score, na.rm = TRUE),             #mean score
                    n_individuals = count(score))           #n of individuals with valid score

Error: Problem with `summarise()` input `n_inviduals`. x no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')" i Input `n_inviduals` is `count(score)`. i The error occured in group 1: group_id = 3.

count

输入将是

tible

或

data.frame

。在这里，我们可以使用

n（）

——如果我们想要行的总数，或者如果我们想要“score”中非NA元素的数量，那么使用

is.NA

创建一个逻辑向量，并使用

sum

获得计数，即TRUE->1和FALSE->0，因此

sum

有点像是获得1s的计数

library(dplyr)
df1 %>%
  ungroup %>%
  mutate(score = case_when( 
  score == 0 ~ NA_real_,                                #assign missing values to zeros
  TRUE ~ score)) %>% 
  group_by(group_id) %>% 
  summarise(n_individuals = sum(!is.na(score) & score != 0),
            score = mean(score, na.rm = TRUE) )

-输出

# A tibble: 6 x 3
#  group_id n_individuals score
#*    <dbl>         <int> <dbl>
#1        3            14  43.0
#2       10            20  49.6
#3       16            21  44.9
#4       26             8  49.0
#5       27             5  35.3
#6       29            17  54.2

#一个tible:6 x 3
#团体id与个人得分
#*              
#1        3            14  43.0
#2       10            20  49.6
#3       16            21  44.9
#4       26             8  49.0
#5       27             5  35.3
#6       29            17  54.2

谢谢你的回答。唯一的问题是，我只想在分数不丢失的情况下计数。你的答案是计算所有人。例如，当group_id==29时，我有27个个体，但只有17个没有缺失（或不是零）。@LuizZ I使用逻辑表达式将答案更新为

sum