R 宽数据帧上的卡方检验
我有如下数据:R 宽数据帧上的卡方检验,r,chi-squared,R,Chi Squared,我有如下数据: ID gamesAlone gamesWithOthers gamesRemotely tvAlone tvWithOthers tvRemotely 1 1 1 2 1 1 3 1
ID gamesAlone gamesWithOthers gamesRemotely tvAlone tvWithOthers tvRemotely
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
Alone WithOthers Remotely
games 2 1 6
tv 4 4 1
我希望代码能够完成以下两件事:
首先,将其转换为一个整洁的列联表,如下所示:
ID gamesAlone gamesWithOthers gamesRemotely tvAlone tvWithOthers tvRemotely
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
Alone WithOthers Remotely
games 2 1 6
tv 4 4 1
其次,使用卡方检验这些活动(游戏和电视)在社交环境中是否不同
这是生成数据帧的代码:
data<-data.frame(ID=c(1,2,3,4,5,6,7,8,9),
gamesAlone=c(1,NA,NA,NA,NA,NA,NA,NA,1),
gamesWithOthers=c(NA,NA,NA,NA,NA,NA,NA,1,NA),
gamesRemotely=c(NA,1,1,1,1,1,1,NA,NA),
tvAlone=c(NA,NA,1,1,NA,1,1,NA,NA),
tvWithOthers=c(1,1,NA,NA,1,NA,NA,1,NA),
tvRemotely=c(NA,NA,NA,NA,NA,NA,NA,NA,1))
data这将使您以给定的形式进入列联表。建议:调用数据框data1
而不是data
,以避免混淆
library(dplyr)
library(tidyr)
data1_table <- data1 %>%
gather(key, value, -ID) %>%
mutate(activity = ifelse(grepl("^tv", key), substring(key, 1, 2), substring(key, 1, 5)),
context = ifelse(grepl("^tv", key), substring(key, 3), substring(key, 6))) %>%
group_by(activity, context) %>%
summarise(n = sum(value, na.rm = TRUE)) %>%
ungroup() %>%
spread(context, n)
# A tibble: 2 x 4
activity Alone Remotely WithOthers
* <chr> <dbl> <dbl> <dbl>
1 games 2 6 1
2 tv 4 1 4
省略第一列id([-1]
),然后在删除NA值(NA.rm=TRUE
)的同时获取每列的总和(colSums
),并将长度为6的结果向量放入一个包含2行的矩阵中。如果需要,还可以相应地为矩阵标注标签(dimnames
参数):
m简单而充分的解决方案。