Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R中有多个变量的频率表_R_Aggregate_Frequency - Fatal编程技术网

R中有多个变量的频率表

R中有多个变量的频率表,r,aggregate,frequency,R,Aggregate,Frequency,我试图复制一个官方统计中经常使用的表格,但迄今为止没有成功。给定这样的数据帧: d1 <- data.frame( StudentID = c("x1", "x10", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),

我试图复制一个官方统计中经常使用的表格,但迄今为止没有成功。给定这样的数据帧:

d1 <- data.frame( StudentID = c("x1", "x10", "x2", 
                          "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
             StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
             ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
             Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
             participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
             passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
             stringsAsFactors = FALSE)
我相信在R有更好的方法来解决这类问题

注意:我见过LaTex解决方案,但我不使用它,因为我需要在Excel中导出表格


使用
plyr

提前感谢:

require(plyr)
ddply(d1, .(ExamenYear), summarize,
      All=length(ExamenYear),
      participated=sum(participated=="yes"),
      ofwhichFemale=sum(StudentGender=="F"),
      ofWhichPassed=sum(passed=="yes"))
其中:

  ExamenYear All participated ofwhichFemale ofWhichPassed
1       2007   3            2             2             2
2       2008   4            3             2             3
3       2009   3            3             0             2

plyr
包非常适合这类东西。首先加载包

library(plyr)
然后我们使用
ddply
函数:

ddply(d1, "ExamenYear", summarise, 
      All = length(passed),##We can use any column for this statistics
      participated = sum(participated=="yes"),
      ofwhichFemale = sum(StudentGender=="F"),
      ofwhichpassed = sum(passed=="yes"))

基本上,ddply需要一个数据帧作为输入并返回一个数据帧。然后,我们通过
ExamenYear
将输入数据框拆分。在每个子表上,我们计算一些汇总统计数据。请注意,在ddply中,我们在引用列时不必使用
$
符号。

可能有一些修改(使用
来减少
df$
调用的数量,并使用字符索引来改进自我文档)您的代码更易于阅读,是
ddply
解决方案的有力竞争对手:

with( d1, cbind(All = table(ExamenYear),
  participated      = table(ExamenYear, participated)[,"yes"],
  ofwhichFemale     = table(ExamenYear, StudentGender)[,"F"],
  ofwhichpassed     = table(ExamenYear, passed)[,"yes"])
     )

     All participated ofwhichFemale ofwhichpassed
2007   3            2             2             2
2008   4            3             2             3
2009   3            3             0             2

我预计这将比ddply解决方案快得多,尽管这只有在处理更大的数据集时才明显。

您可能还想看看plyr的下一个迭代器:

使用GGPrice类语法,通过在C++中编写关键部分提供快速的性能。

d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))

非常感谢。谢谢。我肯定要学习plyr。回答得好,但比@csgillespie晚一分钟。@Jilber,我想你是说早一分钟。你的评论中不应该有“但是”。
d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))