根据sql、r或python中的不同参数计算列之间的结果

根据sql、r或python中的不同参数计算列之间的结果,python,mysql,sql,r,Python,Mysql,Sql,R,大家好,我有这个问题,我希望计算日期之间的天数,前提如下: 状态A是基准日期,所有计算都必须以该日期为参考(按ID分组) 我必须选择较旧的日期作为状态B、C、D 我必须计算天数,并在不同的栏中显示 前 使用R生成表格 ColID = c(1, 1, 1, 1, 1, 2, 2, 2) ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C") ColDate = c("01/01/2018","02/03/2018", "05/

大家好,我有这个问题,我希望计算日期之间的天数,前提如下:

  • 状态A是基准日期,所有计算都必须以该日期为参考(按ID分组)
  • 我必须选择较旧的日期作为状态B、C、D
  • 我必须计算天数,并在不同的栏中显示

使用R生成表格

ColID = c(1, 1, 1, 1, 1, 2, 2, 2)        
ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C")
ColDate = c("01/01/2018","02/03/2018", "05/04/2018", "04/05/2018", "04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018")
data.frame(ColID, ColStatus, ColDate)
ColID = c(1,2)
ResultColStatusB = c(60,0)
ResultColStatusC = c(123,61)
data.frame(ColID, ResultColStatusB, ResultColStatusC, ResultColStatusB)
我如何进行计算

For ColID = 1

Status A = 01/01/2018
Status B (I Have to select the older one) = 02/03/2018
Status C = 04/05/2018
Status D = 04/05/2018

ResultColB = 02/03/2018 - 01/01/2018 = 60
ResultColC = 04/05/2018 - 01/01/2018 = 123
ResultColD = 04/05/2018 - 01/01/2018 = 123
结果表(以天为单位)

使用R生成表格

ColID = c(1, 1, 1, 1, 1, 2, 2, 2)        
ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C")
ColDate = c("01/01/2018","02/03/2018", "05/04/2018", "04/05/2018", "04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018")
data.frame(ColID, ColStatus, ColDate)
ColID = c(1,2)
ResultColStatusB = c(60,0)
ResultColStatusC = c(123,61)
data.frame(ColID, ResultColStatusB, ResultColStatusC, ResultColStatusB)
这个问题可以用R、Python或SQL来解决,有什么建议我来解决这个问题吗?

Python:

import pandas as pd 
Dic = {'ColID': [1, 1, 1, 1, 1, 2, 2, 2],
       'ColStatus': ["A", "B", "B", "C", "D", "A", "C", "C"], 
       'ColDate': ["01/01/2018", "02/03/2018", "05/04/2018", "04/05/2018",
        "04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018"]} 
df = pd.DataFrame(Dic) 
df.ColDate = pd.to_datetime(df.ColDate, format='%d/%m/%Y') 
conditions = [df.ColID==n for n in df.ColID.unique()] 
choices = [df[(df.ColID==n) & (df.ColStatus=='A')]['ColDate'].min() for n in df.ColID.unique()] 
df['Amin'] = pd.np.select(conditions, choices) 
df['days'] = df.ColDate - df.Amin 
df = df[df['days'].dt.days>0]

这里有一个
tidyverse
解决方案:

library(lubridate)
library(tidyverse)

df %>%
  group_by(ColID, ColStatus) %>% 
  summarise(min_date = min(parse_date_time(ColDate, "%d/%m/%Y"))) %>%
  group_by(ColID) %>%
  summarise(a_b = as.period(interval(min_date[ColStatus=="A"], 
                                     min_date[ColStatus=="B"])) %/% days(1) - 1,
            a_c = as.period(interval(min_date[ColStatus=="A"], 
                                     min_date[ColStatus=="C"])) %/% days(1) - 1,
            a_d = as.period(interval(min_date[ColStatus=="A"], 
                                     min_date[ColStatus=="D"])) %/% days(1) - 1) %>%
mutate_all(funs(if_else(is.na(.), 0, .)))
输出:

  ColID   a_b   a_c   a_d
  <dbl> <dbl> <dbl> <dbl>
1    1.   60.  123.  123.
2    2.    0.   61.    0.

或使用
base
R:

df$ColDate <- as.integer(as.Date(df$ColDate, format="%d/%m/%Y"))
cols <- c("B","C","D")
by(df, df$ColID, function(x) {
    aDate <- x$ColDate[x$ColStatus=="A"]
    vapply(cols, 
        function(id) if(any(x$ColStatus==id)) min(x$ColDate[x$ColStatus==id]) - aDate 
            else NA_integer_, 
        integer(1))
})

df$ColDate OP notes:
2018年3月2日-2018年1月1日=60
…看起来像
D/M/Y
谢谢@chinsoon12!非常感谢。我在执行第一个解决方案(tidyverse)时遇到了一个问题,我收到了以下注释:为函数“%/%”选择了签名为“Timespan#Timespan”的方法,目标签名为“Period#Period”。“Period#ANY”、“ANY#Period”也仅是有效的估计值:转换为summary#u impl(.data,dots)中精度错误的间隔:评估错误:逻辑类对象的赋值对slot无效。类Period对象中的数据;is(值,“数值”)不是真的。