根据sql、r或python中的不同参数计算列之间的结果
大家好,我有这个问题,我希望计算日期之间的天数,前提如下:根据sql、r或python中的不同参数计算列之间的结果,python,mysql,sql,r,Python,Mysql,Sql,R,大家好,我有这个问题,我希望计算日期之间的天数,前提如下: 状态A是基准日期,所有计算都必须以该日期为参考(按ID分组) 我必须选择较旧的日期作为状态B、C、D 我必须计算天数,并在不同的栏中显示 前 使用R生成表格 ColID = c(1, 1, 1, 1, 1, 2, 2, 2) ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C") ColDate = c("01/01/2018","02/03/2018", "05/
- 状态A是基准日期,所有计算都必须以该日期为参考(按ID分组)
- 我必须选择较旧的日期作为状态B、C、D
- 我必须计算天数,并在不同的栏中显示
ColID = c(1, 1, 1, 1, 1, 2, 2, 2)
ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C")
ColDate = c("01/01/2018","02/03/2018", "05/04/2018", "04/05/2018", "04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018")
data.frame(ColID, ColStatus, ColDate)
ColID = c(1,2)
ResultColStatusB = c(60,0)
ResultColStatusC = c(123,61)
data.frame(ColID, ResultColStatusB, ResultColStatusC, ResultColStatusB)
我如何进行计算
For ColID = 1
Status A = 01/01/2018
Status B (I Have to select the older one) = 02/03/2018
Status C = 04/05/2018
Status D = 04/05/2018
ResultColB = 02/03/2018 - 01/01/2018 = 60
ResultColC = 04/05/2018 - 01/01/2018 = 123
ResultColD = 04/05/2018 - 01/01/2018 = 123
结果表(以天为单位)
使用R生成表格
ColID = c(1, 1, 1, 1, 1, 2, 2, 2)
ColStatus = c("A", "B", "B", "C", "D", "A", "C", "C")
ColDate = c("01/01/2018","02/03/2018", "05/04/2018", "04/05/2018", "04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018")
data.frame(ColID, ColStatus, ColDate)
ColID = c(1,2)
ResultColStatusB = c(60,0)
ResultColStatusC = c(123,61)
data.frame(ColID, ResultColStatusB, ResultColStatusC, ResultColStatusB)
这个问题可以用R、Python或SQL来解决,有什么建议我来解决这个问题吗?Python:
import pandas as pd
Dic = {'ColID': [1, 1, 1, 1, 1, 2, 2, 2],
'ColStatus': ["A", "B", "B", "C", "D", "A", "C", "C"],
'ColDate': ["01/01/2018", "02/03/2018", "05/04/2018", "04/05/2018",
"04/05/2018", "02/01/2018", "04/03/2018", "05/04/2018"]}
df = pd.DataFrame(Dic)
df.ColDate = pd.to_datetime(df.ColDate, format='%d/%m/%Y')
conditions = [df.ColID==n for n in df.ColID.unique()]
choices = [df[(df.ColID==n) & (df.ColStatus=='A')]['ColDate'].min() for n in df.ColID.unique()]
df['Amin'] = pd.np.select(conditions, choices)
df['days'] = df.ColDate - df.Amin
df = df[df['days'].dt.days>0]
这里有一个
tidyverse
解决方案:
library(lubridate)
library(tidyverse)
df %>%
group_by(ColID, ColStatus) %>%
summarise(min_date = min(parse_date_time(ColDate, "%d/%m/%Y"))) %>%
group_by(ColID) %>%
summarise(a_b = as.period(interval(min_date[ColStatus=="A"],
min_date[ColStatus=="B"])) %/% days(1) - 1,
a_c = as.period(interval(min_date[ColStatus=="A"],
min_date[ColStatus=="C"])) %/% days(1) - 1,
a_d = as.period(interval(min_date[ColStatus=="A"],
min_date[ColStatus=="D"])) %/% days(1) - 1) %>%
mutate_all(funs(if_else(is.na(.), 0, .)))
输出:
ColID a_b a_c a_d
<dbl> <dbl> <dbl> <dbl>
1 1. 60. 123. 123.
2 2. 0. 61. 0.
或使用
base
R:
df$ColDate <- as.integer(as.Date(df$ColDate, format="%d/%m/%Y"))
cols <- c("B","C","D")
by(df, df$ColID, function(x) {
aDate <- x$ColDate[x$ColStatus=="A"]
vapply(cols,
function(id) if(any(x$ColStatus==id)) min(x$ColDate[x$ColStatus==id]) - aDate
else NA_integer_,
integer(1))
})
df$ColDate OP notes:2018年3月2日-2018年1月1日=60
…看起来像D/M/Y
谢谢@chinsoon12!非常感谢。我在执行第一个解决方案(tidyverse)时遇到了一个问题,我收到了以下注释:为函数“%/%”选择了签名为“Timespan#Timespan”的方法,目标签名为“Period#Period”。“Period#ANY”、“ANY#Period”也仅是有效的估计值:转换为summary#u impl(.data,dots)中精度错误的间隔:评估错误:逻辑类对象的赋值对slot无效。类Period对象中的数据;is(值,“数值”)不是真的。