Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/iphone/44.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 分组或汇总并计数_R - Fatal编程技术网

R 分组或汇总并计数

R 分组或汇总并计数,r,R,我有如下原始数据,其中枚举数多次输入相同的值,我想将其汇总为所需的输出,请参阅附件,如果您可以告诉我代码R,我将不胜感激 提前感谢您如果我们需要对所有问题执行此操作,一个选项是将格式改为“长”格式,并获得计数 library(dplyr) library(tidyr) out <- df1 %>% pivot_longer(cols = province:village_other, names_to = "Question_name", v

我有如下原始数据,其中枚举数多次输入相同的值,我想将其汇总为所需的输出,请参阅附件,如果您可以告诉我代码R,我将不胜感激
提前感谢您

如果我们需要对所有问题执行此操作,一个选项是将格式改为“长”格式,并获得
计数

library(dplyr)
library(tidyr)
out <- df1 %>%
         pivot_longer(cols = province:village_other, 
           names_to = "Question_name", values_to= "Text_answer", 
                  values_drop_na = TRUE) %>%
     count(enumerator_id, Question_name, Text_answer)

out %>%
   filter(Question_name == 'village_other')
# A tibble: 3 x 4
#  enumerator_id Question_name Text_answer     n
#          <dbl> <chr>         <chr>       <int>
#1             1 village_other Z               3
#2             2 village_other D               2
#3             3 village_other J               1

另一个选项是使用
map
循环遍历感兴趣的列名,并在
列表中获得
计数

library(purrr)
map(names(df1)[3:6], ~ df1 %>% 
                    filter_at(vars(.x), any_vars(!is.na(.))) %>%
                    count(enumerator_id, !! rlang::sym(.x)))
数据
df1或它也是
df1%>%过滤器(!is.na(village\u other))%%>%计数(枚举器id,village\u other)%%>%变异(问题名称='village\u other')
@akrun如果我在这里勾选多个问题名称怎么办?这是唯一的村庄?其他如果种族群体和iset怎么办?所有这些都应该放在一张表下面谢谢Arkun@arkun我的意思是,它们位于不同的列中,与village other不同的列是一个列名,iset是一个列名族裔群体是一个列名,但输出应列出所有这些列,如Summary in question names应该有village_other,iset和其他一些专栏names@MaxMiak最好是将形状改为长形,然后在once@arkun谢谢你,但你仍然只过滤其他村庄,我希望所有这些具体的变量不仅仅是one@MaxMiak这只是为了显示所需的输出。如果您通过在控制台或视图上键入out来签出
,则会显示整个输出。假设,如果输出是100万行,我不能在这里显示行的总数。因此,我使用了
filter
,这是一个很好的选择,可以过滤或保留行,以便向用户显示rows@arkun非常感谢您的回答和耐心最后一个问题我是否可以连接cols=c(village_other,A,B)等cols中的特定变量以仅获取这些列?
library(purrr)
map(names(df1)[3:6], ~ df1 %>% 
                    filter_at(vars(.x), any_vars(!is.na(.))) %>%
                    count(enumerator_id, !! rlang::sym(.x)))
df1 <- structure(list(enumerator_id = c(1, 2, 1, 3, 2, 1, 3, 1), 
 date = c("5/18/2020", 
"5/19/2020", "5/20/2020", "5/21/2020", "5/22/2020", "5/23/2020", 
"5/24/2020", "5/25/2020"), province = c("A", "C", "X", "E", "A", 
"C", "H", "A"), district = c("B", "A", "Y", "F", "B", "A", "I", 
"B"), village = c("C", NA, NA, "G", NA, NA, NA, NA), village_other = c(NA, 
"D", "Z", NA, "D", "Z", "J", "Z")), class = "data.frame", row.names = c(NA, 
-8L))