R 获取基于条件的行数-特定年份中的日期字段,按文本字段分组

R 获取基于条件的行数-特定年份中的日期字段,按文本字段分组,r,statistics,R,Statistics,我每天都会发布一份excel报告,我需要总结并提供趋势分析。此报表包含具有创建日期、工作项类型的工作项列表。如何获得2011年、2012年创建的工作项的数量?另外,如何按工作项类型获取计数?到目前为止,我已经能够加载excel数据并通过执行以下操作获得行数- library(gdata) wi20121812 = read.xls("WorkItemReport20121812.xls") nrow(wi20121812) 样本数据 > dput(head(workItemRepo

我每天都会发布一份excel报告,我需要总结并提供趋势分析。此报表包含具有创建日期、工作项类型的工作项列表。如何获得2011年、2012年创建的工作项的数量?另外,如何按工作项类型获取计数?到目前为止,我已经能够加载excel数据并通过执行以下操作获得行数-

library(gdata)
wi20121812 = read.xls("WorkItemReport20121812.xls")
nrow(wi20121812)
样本数据

   > dput(head(workItemReport2))
structure(list(DocType = structure(c(6L, 7L, 6L, 6L, 8L, 6L), .Label = c("TYPE10WI", 
"TYPE11WI", "TYPE12WI", "TYPE13WI", "TYPE14WI", "TYPE1WI", "TYPE2WI", 
"TYPE3WI", "TYPE4WI", "TYPE5WI", "TYPE6WI", "TYPE7WI", "TYPE8WI", 
"TYPE9WI"), class = "factor"), CreatedDate = structure(c(7L, 
22L, 146L, 181L, 153L, 191L), .Label = c("1/10/12 15:43 AM/PM ", 
"1/10/12 16:06 AM/PM ", "1/10/12 5:28 AM/PM ", "1/10/12 5:56 AM/PM ", 
"1/11/12 19:51 AM/PM ", "1/11/12 5:26 AM/PM ", "1/12/11 21:58 AM/PM ", 
"1/12/12 11:08 AM/PM ", "1/12/12 5:41 AM/PM ", "1/12/12 9:56 AM/PM ", 
"1/13/12 14:01 AM/PM ", "1/13/12 15:08 AM/PM ", "1/13/12 15:11 AM/PM ", 
"1/13/12 8:51 AM/PM ", "1/16/12 10:27 AM/PM ", "1/16/12 10:28 AM/PM ", 
"1/16/12 16:37 AM/PM ", "1/16/12 7:52 AM/PM ", "1/18/12 15:02 AM/PM ", 
"1/18/12 16:03 AM/PM ", "1/18/12 16:13 AM/PM ", "1/19/11 19:23 AM/PM ", 
"1/20/12 10:48 AM/PM ", "1/20/12 12:23 AM/PM ", "1/20/12 8:38 AM/PM ", 
"1/23/12 5:53 AM/PM ", "1/24/12 15:18 AM/PM ", "1/24/12 8:23 AM/PM ", 
"1/24/12 8:58 AM/PM ", "1/25/12 11:38 AM/PM ", "1/25/12 5:28 AM/PM ", 
"1/26/12 13:48 AM/PM ", "1/26/12 15:53 AM/PM ", "1/26/12 15:58 AM/PM ", 
"1/26/12 16:13 AM/PM ", "1/26/12 16:18 AM/PM ", "1/26/12 7:33 AM/PM ", 
"1/27/12 7:48 AM/PM ", "1/3/12 17:48 AM/PM ", "1/3/12 18:33 AM/PM ", 
"1/3/12 9:07 AM/PM ", "1/30/12 11:22 AM/PM ", "1/30/12 22:52 AM/PM ", 
"1/30/12 23:10 AM/PM ", "1/31/12 19:54 AM/PM ", "1/31/12 20:39 AM/PM ", 
"1/31/12 5:42 AM/PM ", "1/31/12 9:42 AM/PM ", "1/4/12 14:02 AM/PM ", 
"1/4/12 9:52 AM/PM ", "1/5/12 13:42 AM/PM ", "1/5/12 17:42 AM/PM ", 
....
....
"9/6/12 9:02 AM/PM ", "9/7/12 11:48 AM/PM ", "9/7/12 12:58 AM/PM ", 
"9/7/12 13:52 AM/PM ", "9/7/12 15:07 AM/PM ", "9/7/12 15:12 AM/PM ", 
"9/7/12 15:22 AM/PM ", "9/7/12 15:47 AM/PM ", "9/7/12 15:52 AM/PM ", 
"9/7/12 8:42 AM/PM ", "9/7/12 9:32 AM/PM ", "9/8/11 23:43 AM/PM "
), class = "factor")), .Names = c("DocType", "CreatedDate"), row.names = c(NA, 
6L), class = "data.frame")
> 

您可以使用
plyr
软件包中的
ddply

res = ddply(df, "year", summarise, amount = length(year))
或者使用
count
形成相同的包(这更容易):


其中,
df
是一个
数据。frame
包含您的数据,而
year
是列的列名,其中包含分类变量,详细说明了该行是在哪一年创建的。

您的问题中尚未回答的一部分,“如何获取工作项类型的计数”非常简单

res <- table(wi20121812[, "WorkItemType"])
或者同时做这两件事:

res <- prop.table(table(wi20121812[, "WorkItemType"]))

res请提供可复制的数据,例如,
head(wi20121812)
。更好的是:
dput(head(wi20121812))
将样本数据添加到question@VineetBhatia使用
dput
表达式生成此示例数据。不清楚您的示例中的数据是什么样子的。这样做会得到>res prop.table(res)类型10WI类型11WI类型12WI类型13WI类型14WI类型1WI类型2WI类型3WI类型4WI类型5WI 0.005835544 0.010079576 0.030238727 0.001061008 0.001591512 0.303978780 0.013226599 0.036074271 0.384084881 0.107692308类型6WI类型7WI类型8WI 0.041909814 0.005835544 0.013226599 0.045092838>对。因为这些数字是比例,所以你只需要将它们乘以100就可以得到百分比。因此,Type10WI占所有工作项的0.6%,Type11WI约占1%,依此类推我获取>res CreatedDate freq 1 00:05.4 1 2 00:05.6 1 3 00:19.7 1 4 00:36.8 1 5 00:37.0 1 6 00:42.7 1 7 00:42.8 1我想获取2011年创建的工作项和2012年创建的工作项的计数?
prop.table(res)
res <- prop.table(table(wi20121812[, "WorkItemType"]))