在R中为完全分类的数据集创建透视表

在R中为完全分类的数据集创建透视表,r,R,我有一个表,其中所有的值都是分类的。 看起来是这样的: sample region question1 question2 1 reg1 yes yes 2 reg2 yes maybe 3 reg3 yes maybe 4 reg3 no yes # build the sample data sample_data <- data.frame( sample = 1:4, region = c("reg1", "reg2", "reg3", "reg3"), questi

我有一个表,其中所有的值都是分类的。 看起来是这样的:

sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes
# build the sample data
sample_data <- data.frame(
  sample = 1:4,
  region = c("reg1", "reg2", "reg3", "reg3"),
  question1 = c("yes", "yes", "yes", "no"),
  question2 = c("yes", "maybe", "maybe", "yes"),
  stringsAsFactors = TRUE
)

# get the variable names you want to summarize
question_vars <- grep("question", names(sample_data), value = TRUE)
有没有一种简单的方法来汇总数据,以统计有多少人在不写循环的情况下对一个问题回答“是”和“否”

我的目标是:

question
    yes no
reg1 15 20
reg2 30 11
等等


我已经检查了Reformae2软件包,但它似乎不能满足我的需要。

最简单的答案似乎是
table()

组成数据:

dd <- read.table(text="
sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes",
header=TRUE)
这是因为问题1没有“可能”的回答。如果您想将其排除在外,您可以:

dd2 <- subset(dd,question1 %in% c("no","yes"))
with(dd2,table(...))
这为我们提供了一个3x3x2表(每个问题的regionxvalue表)。如果我们想要长格式:

ttm <- melt(tt,value.name="count")
res <- dcast(ttm,region+variable~value,value.var="count")
##   region  variable maybe no yes
## 1   reg1 question1     0  0   1
## 2   reg1 question2     0  0   1
## 3   reg2 question1     0  0   1
## 4   reg2 question2     1  0   0
## 5   reg3 question1     0  1   1
## 6   reg3 question2     1  0   1

ttm我假设您的数据位于
data.frame
中,如下所示:

sample region question1 question2
1 reg1 yes yes
2 reg2 yes maybe
3 reg3 yes maybe
4 reg3 no yes
# build the sample data
sample_data <- data.frame(
  sample = 1:4,
  region = c("reg1", "reg2", "reg3", "reg3"),
  question1 = c("yes", "yes", "yes", "no"),
  question2 = c("yes", "maybe", "maybe", "yes"),
  stringsAsFactors = TRUE
)

# get the variable names you want to summarize
question_vars <- grep("question", names(sample_data), value = TRUE)
但是,您可以通过高级使用
lappy
缩短此代码:

tables_by_region <- lapply(sample_data[question_vars], aggregate,
                           sample_data["region"], table)

tables\u by\u region您似乎希望按区域进行聚合。对的为什么要避免循环呢?老实说,我想避免循环是为了了解更多关于R函数的信息。是的,我想按地区汇总。另外,我想分别计算每个问题的总数。我的缺点是,我知道table()函数,但我确实想到要使用它。我刚查过。它适用于>2个答案。亚妮丝,我甚至没想过要用这种方式交叉列表。
tables_by_region <- lapply(sample_data[question_vars], aggregate,
                           sample_data["region"], table)