在R中为完全分类的数据集创建透视表
我有一个表,其中所有的值都是分类的。 看起来是这样的:在R中为完全分类的数据集创建透视表,r,R,我有一个表,其中所有的值都是分类的。 看起来是这样的: sample region question1 question2 1 reg1 yes yes 2 reg2 yes maybe 3 reg3 yes maybe 4 reg3 no yes # build the sample data sample_data <- data.frame( sample = 1:4, region = c("reg1", "reg2", "reg3", "reg3"), questi
sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes
# build the sample data
sample_data <- data.frame(
sample = 1:4,
region = c("reg1", "reg2", "reg3", "reg3"),
question1 = c("yes", "yes", "yes", "no"),
question2 = c("yes", "maybe", "maybe", "yes"),
stringsAsFactors = TRUE
)
# get the variable names you want to summarize
question_vars <- grep("question", names(sample_data), value = TRUE)
有没有一种简单的方法来汇总数据,以统计有多少人在不写循环的情况下对一个问题回答“是”和“否”
我的目标是:
question
yes no
reg1 15 20
reg2 30 11
等等
我已经检查了Reformae2软件包,但它似乎不能满足我的需要。最简单的答案似乎是
table()
组成数据:
dd <- read.table(text="
sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes",
header=TRUE)
这是因为问题1没有“可能”的回答。如果您想将其排除在外,您可以:
dd2 <- subset(dd,question1 %in% c("no","yes"))
with(dd2,table(...))
这为我们提供了一个3x3x2表(每个问题的regionxvalue表)。如果我们想要长格式:
ttm <- melt(tt,value.name="count")
res <- dcast(ttm,region+variable~value,value.var="count")
## region variable maybe no yes
## 1 reg1 question1 0 0 1
## 2 reg1 question2 0 0 1
## 3 reg2 question1 0 0 1
## 4 reg2 question2 1 0 0
## 5 reg3 question1 0 1 1
## 6 reg3 question2 1 0 1
ttm我假设您的数据位于data.frame
中,如下所示:
sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes
# build the sample data
sample_data <- data.frame(
sample = 1:4,
region = c("reg1", "reg2", "reg3", "reg3"),
question1 = c("yes", "yes", "yes", "no"),
question2 = c("yes", "maybe", "maybe", "yes"),
stringsAsFactors = TRUE
)
# get the variable names you want to summarize
question_vars <- grep("question", names(sample_data), value = TRUE)
但是,您可以通过高级使用lappy
缩短此代码:
tables_by_region <- lapply(sample_data[question_vars], aggregate,
sample_data["region"], table)
tables\u by\u region您似乎希望按区域进行聚合。对的为什么要避免循环呢?老实说,我想避免循环是为了了解更多关于R函数的信息。是的,我想按地区汇总。另外,我想分别计算每个问题的总数。我的缺点是,我知道table()函数,但我确实想到要使用它。我刚查过。它适用于>2个答案。亚妮丝,我甚至没想过要用这种方式交叉列表。
tables_by_region <- lapply(sample_data[question_vars], aggregate,
sample_data["region"], table)