R-从日期范围自动填充数据框中的列

R-从日期范围自动填充数据框中的列,r,R,我将员工缺勤数据作为以下格式的数据框: EmpID Name LeaveFrom LeaveTo xz123 ABC 07/12/2016 07/08/2016 kp546 PQR 06/28/2016 07/02/2016 xz123 ABC 07/25/2016 07/27/2016 EMPID Name Jul-01 Jul-02 Jul03 ..Jul07 Jul08....Jul25 Jul26 Jul27

我将员工缺勤数据作为以下格式的数据框:

EmpID   Name   LeaveFrom    LeaveTo
xz123   ABC    07/12/2016   07/08/2016
kp546   PQR    06/28/2016   07/02/2016    
xz123   ABC    07/25/2016   07/27/2016
EMPID    Name  Jul-01  Jul-02 Jul03 ..Jul07 Jul08....Jul25  Jul26 Jul27 .Jul 31

xz123    ABC    P       P      P ...  A      A          A      A     A     P
kp546    PQR    A       A      P      P      P          P      P     P     P
员工可能有多行被分配给他/她

我想创建一个DF,以以下格式反映上述内容:

EmpID   Name   LeaveFrom    LeaveTo
xz123   ABC    07/12/2016   07/08/2016
kp546   PQR    06/28/2016   07/02/2016    
xz123   ABC    07/25/2016   07/27/2016
EMPID    Name  Jul-01  Jul-02 Jul03 ..Jul07 Jul08....Jul25  Jul26 Jul27 .Jul 31

xz123    ABC    P       P      P ...  A      A          A      A     A     P
kp546    PQR    A       A      P      P      P          P      P     P     P
其中p代表在场,A代表缺席

你知道怎么做吗。我有大约30000张唱片

以下是我的尝试:

library(reshape2)
df <- read.table(text = "EmpID   Name   LeaveFrom    LeaveTo
    xz123   ABC    07/12/2016   07/18/2016
    kp546   PQR    06/28/2016   07/02/2016    
    xz123   ABC    07/25/2016   07/27/2016", header = TRUE)

## Convert is to date format first
df$LeaveFrom <- as.Date(df$LeaveFrom, "%m/%d/%Y")
df$LeaveTo <- as.Date(df$LeaveTo, "%m/%d/%Y")

## get sequence of dates to use
dates <- seq(min(df$LeaveFrom), max(df$LeaveTo), by = 1)
present <- rep("P", length(dates))

## apply the following function for every EmpID
resList <- lapply(split(df, df$EmpID), function(x){
    ## You need reshape2 packages to do this
    ## If you don't have it, you can install it: install.packages("reshape2")

    mL <- melt(x, id.vars = c("EmpID", "Name"))

    ## Number of rows is equal to the number of breaks
    ## It should be equal to number of times a given EmpID appears in df
    ## first column is the date at which the break began and second column is the date at which the break ended
    ## Instead of using dates, I convert them to indices of "dates" vector defined above
    m <- matrix(match(mL$value, dates), ncol = 2)

    ## make a vector of dates (indices) when an employee was absent
    absent <- unlist(apply(m, 1, function(x){
        x[1]:x[2]
    }))

    ## mark absent
    present[absent] <- "A"
    return(present)
})

df.final <- as.data.frame(do.call("rbind", resList))

## change date format to make it easier to read
colnames(df.final) <- format(dates, "%m/%d")
print(df.final)

##       06/28 06/29 06/30 07/01 07/02 07/03 07/04 07/05 07/06 07/07 07/08 07/09 07/10 07/11 07/12 07/13 07/14 07/15 07/16 07/17 07/18 07/19 07/20 07/21 07/22 07/23 07/24 07/25 07/26 07/27
## kp546     A     A     A     A     A     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P     P
## xz123     P     P     P     P     P     P     P     P     P     P     P     P     P     P     A     A     A     A     A     A     A     P     P     P     P     P     P     A     A     A
library(重塑2)

看起来你想让我们为你写一些代码。虽然许多用户愿意为陷入困境的程序员编写代码,但他们通常只在海报已经试图自己解决问题时才提供帮助。演示这项工作的一个好方法是包括您迄今为止编写的代码、示例输入(如果有)、预期输出和实际获得的输出(控制台输出、回溯等)。你提供的细节越多,你可能得到的答案就越多。检查和。应该有一个自动脚本来检测代码/用户在解决问题时的努力,并将上述内容作为第一条评论发布!我知道的唯一方法就是每行循环一次。我认为这不是最优雅的方式。我正在寻找避免在每一行中循环的方向至少包括您当前的最佳尝试,以便用户可以评论/改进您的解决方案,第一条评论中的链接可能有助于确保您的查询得到更多响应