R中有多个变量的频率表_R_Aggregate_Frequency

R中有多个变量的频率表

R中有多个变量的频率表,r,aggregate,frequency,R,Aggregate,Frequency,我试图复制一个官方统计中经常使用的表格，但迄今为止没有成功。给定这样的数据帧： d1 <- data.frame( StudentID = c("x1", "x10", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),

我试图复制一个官方统计中经常使用的表格，但迄今为止没有成功。给定这样的数据帧：

d1 <- data.frame( StudentID = c("x1", "x10", "x2", 
                          "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
             StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
             ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
             Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
             participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
             passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
             stringsAsFactors = FALSE)

我相信在R有更好的方法来解决这类问题

注意：我见过LaTex解决方案，但我不使用它，因为我需要在Excel中导出表格

使用

plyr
提前感谢：
require(plyr)
ddply(d1, .(ExamenYear), summarize,
      All=length(ExamenYear),
      participated=sum(participated=="yes"),
      ofwhichFemale=sum(StudentGender=="F"),
      ofWhichPassed=sum(passed=="yes"))

其中：
  ExamenYear All participated ofwhichFemale ofWhichPassed
1       2007   3            2             2             2
2       2008   4            3             2             3
3       2009   3            3             0             2

plyr
包非常适合这类东西。首先加载包
library(plyr)

然后我们使用ddply
函数：
ddply(d1, "ExamenYear", summarise, 
      All = length(passed),##We can use any column for this statistics
      participated = sum(participated=="yes"),
      ofwhichFemale = sum(StudentGender=="F"),
      ofwhichpassed = sum(passed=="yes"))

基本上，ddply需要一个数据帧作为输入并返回一个数据帧。然后，我们通过ExamenYear
将输入数据框拆分。在每个子表上，我们计算一些汇总统计数据。请注意，在ddply中，我们在引用列时不必使用$
符号。
可能有一些修改（使用和来减少df$
调用的数量，并使用字符索引来改进自我文档）您的代码更易于阅读，是ddply
解决方案的有力竞争对手：
with( d1, cbind(All = table(ExamenYear),
  participated      = table(ExamenYear, participated)[,"yes"],
  ofwhichFemale     = table(ExamenYear, StudentGender)[,"F"],
  ofwhichpassed     = table(ExamenYear, passed)[,"yes"])
     )

     All participated ofwhichFemale ofwhichpassed
2007   3            2             2             2
2008   4            3             2             3
2009   3            3             0             2

我预计这将比ddply解决方案快得多，尽管这只有在处理更大的数据集时才明显。
您可能还想看看plyr的下一个迭代器：
使用GGPrice类语法，通过在C++中编写关键部分提供快速的性能。
d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))

非常感谢。谢谢。我肯定要学习plyr。回答得好，但比@csgillespie晚一分钟。@Jilber，我想你是说早一分钟。你的评论中不应该有“但是”。
d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))