R-从日期范围自动填充数据框中的列
我将员工缺勤数据作为以下格式的数据框:R-从日期范围自动填充数据框中的列,r,R,我将员工缺勤数据作为以下格式的数据框: EmpID Name LeaveFrom LeaveTo xz123 ABC 07/12/2016 07/08/2016 kp546 PQR 06/28/2016 07/02/2016 xz123 ABC 07/25/2016 07/27/2016 EMPID Name Jul-01 Jul-02 Jul03 ..Jul07 Jul08....Jul25 Jul26 Jul27
EmpID Name LeaveFrom LeaveTo
xz123 ABC 07/12/2016 07/08/2016
kp546 PQR 06/28/2016 07/02/2016
xz123 ABC 07/25/2016 07/27/2016
EMPID Name Jul-01 Jul-02 Jul03 ..Jul07 Jul08....Jul25 Jul26 Jul27 .Jul 31
xz123 ABC P P P ... A A A A A P
kp546 PQR A A P P P P P P P
员工可能有多行被分配给他/她
我想创建一个DF,以以下格式反映上述内容:
EmpID Name LeaveFrom LeaveTo
xz123 ABC 07/12/2016 07/08/2016
kp546 PQR 06/28/2016 07/02/2016
xz123 ABC 07/25/2016 07/27/2016
EMPID Name Jul-01 Jul-02 Jul03 ..Jul07 Jul08....Jul25 Jul26 Jul27 .Jul 31
xz123 ABC P P P ... A A A A A P
kp546 PQR A A P P P P P P P
其中p代表在场,A代表缺席
你知道怎么做吗。我有大约30000张唱片以下是我的尝试:
library(reshape2)
df <- read.table(text = "EmpID Name LeaveFrom LeaveTo
xz123 ABC 07/12/2016 07/18/2016
kp546 PQR 06/28/2016 07/02/2016
xz123 ABC 07/25/2016 07/27/2016", header = TRUE)
## Convert is to date format first
df$LeaveFrom <- as.Date(df$LeaveFrom, "%m/%d/%Y")
df$LeaveTo <- as.Date(df$LeaveTo, "%m/%d/%Y")
## get sequence of dates to use
dates <- seq(min(df$LeaveFrom), max(df$LeaveTo), by = 1)
present <- rep("P", length(dates))
## apply the following function for every EmpID
resList <- lapply(split(df, df$EmpID), function(x){
## You need reshape2 packages to do this
## If you don't have it, you can install it: install.packages("reshape2")
mL <- melt(x, id.vars = c("EmpID", "Name"))
## Number of rows is equal to the number of breaks
## It should be equal to number of times a given EmpID appears in df
## first column is the date at which the break began and second column is the date at which the break ended
## Instead of using dates, I convert them to indices of "dates" vector defined above
m <- matrix(match(mL$value, dates), ncol = 2)
## make a vector of dates (indices) when an employee was absent
absent <- unlist(apply(m, 1, function(x){
x[1]:x[2]
}))
## mark absent
present[absent] <- "A"
return(present)
})
df.final <- as.data.frame(do.call("rbind", resList))
## change date format to make it easier to read
colnames(df.final) <- format(dates, "%m/%d")
print(df.final)
## 06/28 06/29 06/30 07/01 07/02 07/03 07/04 07/05 07/06 07/07 07/08 07/09 07/10 07/11 07/12 07/13 07/14 07/15 07/16 07/17 07/18 07/19 07/20 07/21 07/22 07/23 07/24 07/25 07/26 07/27
## kp546 A A A A A P P P P P P P P P P P P P P P P P P P P P P P P P
## xz123 P P P P P P P P P P P P P P A A A A A A A P P P P P P A A A
library(重塑2)
看起来你想让我们为你写一些代码。虽然许多用户愿意为陷入困境的程序员编写代码,但他们通常只在海报已经试图自己解决问题时才提供帮助。演示这项工作的一个好方法是包括您迄今为止编写的代码、示例输入(如果有)、预期输出和实际获得的输出(控制台输出、回溯等)。你提供的细节越多,你可能得到的答案就越多。检查和。应该有一个自动脚本来检测代码/用户在解决问题时的努力,并将上述内容作为第一条评论发布!我知道的唯一方法就是每行循环一次。我认为这不是最优雅的方式。我正在寻找避免在每一行中循环的方向至少包括您当前的最佳尝试,以便用户可以评论/改进您的解决方案,第一条评论中的链接可能有助于确保您的查询得到更多响应