多分类变量的R频率表
我已经从SPSS.SAV文件中导入了访谈数据,作为多分类变量的R频率表,r,dplyr,plyr,frequency,summary,R,Dplyr,Plyr,Frequency,Summary,我已经从SPSS.SAV文件中导入了访谈数据,作为data.frame,现在我正试图根据问题编号和访谈位置创建一个频率表。下面是一个示例data.frame: loc<-c("city1","city2","city1","city2","city1","city1","city2","city2","city1","city2") q1<-c("YES","YES","NO","MAYBE","NO","NO","YES","NO","MAYBE","MAYBE") q2<-
data.frame
,现在我正试图根据问题编号和访谈位置创建一个频率表。下面是一个示例data.frame
:
loc<-c("city1","city2","city1","city2","city1","city1","city2","city2","city1","city2")
q1<-c("YES","YES","NO","MAYBE","NO","NO","YES","NO","MAYBE","MAYBE")
q2<-c("YES","NO","MAYBE","YES","NO","MAYBE","MAYBE","YES","YES","NO")
q3<-c("NO","NO","NO","NO","YES","YES","MAYBE","MAYBE","NO","MAYBE")
df<-data.frame(loc,q1,q2,q3)
df
loc q1 q2 q3
1 city1 YES YES NO
2 city2 YES NO NO
3 city1 NO MAYBE NO
4 city2 MAYBE YES NO
5 city1 NO NO YES
6 city1 NO MAYBE YES
7 city2 YES MAYBE MAYBE
8 city2 NO YES MAYBE
9 city1 MAYBE YES NO
10 city2 MAYBE NO MAYBE
到目前为止,我一直在玩
plyr
软件包中的count()
、ddply()
和summary()
。我目前的解决方案非常老套,包括用loc
拆分df
,用as.data.frame(summary(df_city1))
创建频率表,从摘要字符串中检索频率,并将city1
和city2
的摘要data.frame
合并在一起。我想必须有一个更简单/更优雅的解决方案。我们将数据集从“宽”转换为“长”(gather
这样做),然后group\u by
)“loc”、“quest”、“answ”,并使用tally
获得计数。但是,如果我们正在寻找数据集中未找到的计数为0的组合,那么我们可能需要加入一个数据集,该数据集具有三列的所有唯一的组合(complete
和unique
)
库(dplyr)
图书馆(tidyr)
dfN%
完成(loc、quest、answ)%>%
唯一的()
res%
组员(loc、quest、answ)%>%
计数()%>%
左联合(dfN),%%>%
变异(n=ifelse(is.na(n),0,n))
物件
#地址:answ n
#(fctr)(chr)(chr)(dbl)
#1城市1第一季度可能1
#2城市1第1季度第3期
#3城市1第一季度是1
#4城市1第2季度可能2
#5城市1第2季度第1号
#6城市1第2季度是2
#7城市1第3季度可能0
#8第1城市第3季度第3号
#9城市1第3季度是2
#10城市2第一季度可能2
#11城市2第一季度第一期
#12城市2第一季度是2
#13城市2第2季度可能1
#14城市2第2季度第2号
#15城市2季度2是2
#16城市2第3季度可能3
#17城市2第3季度第2号
#18城市2第3季度是0
谢谢@akrun,但是您的解决方案不会产生预期的结果。现在,对于每一次计数,都会有一行额外的“res”。@viktor\r我忘记了唯一的
。现在应该可以了
loc quest answ freq
1 city1 q1 YES 1
2 city1 q1 NO 3
3 city1 q1 MAYBE 1
4 city2 q1 YES 2
5 city2 q1 NO 1
6 city2 q1 MAYBE 2
7 city1 q2 YES 2
8 city1 q2 NO 1
9 city1 q2 MAYBE 2
10 city2 q2 YES 2
11 city2 q2 NO 2
12 city2 q2 MAYBE 1
13 city1 q3 YES 2
14 city1 q3 NO 3
15 city1 q3 MAYBE 0
16 city2 q3 YES 0
17 city2 q3 NO 2
18 city2 q3 MAYBE 3
library(dplyr)
library(tidyr)
dfN <- gather(df, quest, answ, q1:q3) %>%
complete(loc, quest, answ) %>%
unique()
res <- gather(df, quest, answ, q1:q3) %>%
group_by(loc, quest, answ) %>%
tally() %>%
left_join(dfN, .) %>%
mutate(n = ifelse(is.na(n), 0, n))
res
# loc quest answ n
# (fctr) (chr) (chr) (dbl)
#1 city1 q1 MAYBE 1
#2 city1 q1 NO 3
#3 city1 q1 YES 1
#4 city1 q2 MAYBE 2
#5 city1 q2 NO 1
#6 city1 q2 YES 2
#7 city1 q3 MAYBE 0
#8 city1 q3 NO 3
#9 city1 q3 YES 2
#10 city2 q1 MAYBE 2
#11 city2 q1 NO 1
#12 city2 q1 YES 2
#13 city2 q2 MAYBE 1
#14 city2 q2 NO 2
#15 city2 q2 YES 2
#16 city2 q3 MAYBE 3
#17 city2 q3 NO 2
#18 city2 q3 YES 0