Mysql 将dplyr转换为sql
我的数据结构如下:Mysql 将dplyr转换为sql,mysql,r,dplyr,Mysql,R,Dplyr,我的数据结构如下: id start end cancelled 1 2020-01-01 2020-12-31 2021-01-10 1 2021-02-01 2022-01-31 NA 2 2020-01-01 2020-12-31 NA 3 2020-01-01 2020-06-30 2020-07-01 3 2020-07-10 2021-01-09 2021-01-31 3 2021-02-02
id start end cancelled
1 2020-01-01 2020-12-31 2021-01-10
1 2021-02-01 2022-01-31 NA
2 2020-01-01 2020-12-31 NA
3 2020-01-01 2020-06-30 2020-07-01
3 2020-07-10 2021-01-09 2021-01-31
3 2021-02-02 2021-08-01 NA
这些数据代表俱乐部会员资格,目标是提取那些取消会员资格并随后重新加入的会员。我特别感兴趣的是取消和重新加入之间的天数
在R中,我可以做:
dat <- structure(list(id = c(1, 1, 2, 3, 3, 3), start = c("2020-01-01",
"2021-02-01", "2020-01-01", "2020-01-01", "2020-07-10", "2021-02-02"
), end = c("2020-12-31", "2022-01-31", "2020-12-31", "2020-06-30",
"2021-01-09", "2021-08-01"), cancelled = c("2021-01-10", NA,
NA, "2020-07-01", "2021-01-31", NA)), class = "data.frame", row.names = c(NA,
-6L)) %>%
dat[,-1] <- lapply(dat[,-1], as.Date)
dat %>%
group_by(id) %>%
summarize(
rejoin_date = start[-1],
time_to_rejoin = as.numeric(start[-1] - cancelled[-n()], units="days")
) %>% drop_na(time_to_rejoin) %>%
ungroup()
我如何在MySQL中做到这一点
CREATE TABLE IF NOT EXISTS `dat` (
`id` int(6) unsigned NOT NULL,
`start` TIMESTAMP,
`end` TIMESTAMP,
`cancelled` TIMESTAMP NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `dt` (`id`, `start`, `end`, `cancelled`) VALUES
('1', '2020-01-01', '2020-12-31', '2021-01-10'),
('2', '2021-02-01', '2022-01-31', NULL ),
('2', '2021-01-01', '2020-12-31', NULL ),
('3', '2020-01-01', '2020-06-30', '2020-07-01'),
('3', '2020-07-10', '2021-01-09', '2021-01-31'),
('3', '2021-02-02', '2021-08-01', NULL )
选择t1.id,
合并t1.cancelled,t1.end`end`,
t2.下一次开始,
DATEDIFFt2.start,COALESCEt1.cancelled,t1.end间隙
从dat t1开始
在t1.id=t2.id上连接dat t2
合并t1.cancelled,t1.end如果设置了COALESCE,且未设置end is it not set为NULL,则它用于执行取消。注意,按照惯例,名为“id”的列通常是代理主列KEY@Strawberry谢谢,你说得对。我只是想让它简单些。+谢谢,这似乎很有效。你能描述一下它是如何工作的吗?@JoeKing补充了一些解释。
CREATE TABLE IF NOT EXISTS `dat` (
`id` int(6) unsigned NOT NULL,
`start` TIMESTAMP,
`end` TIMESTAMP,
`cancelled` TIMESTAMP NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `dt` (`id`, `start`, `end`, `cancelled`) VALUES
('1', '2020-01-01', '2020-12-31', '2021-01-10'),
('2', '2021-02-01', '2022-01-31', NULL ),
('2', '2021-01-01', '2020-12-31', NULL ),
('3', '2020-01-01', '2020-06-30', '2020-07-01'),
('3', '2020-07-10', '2021-01-09', '2021-01-31'),
('3', '2021-02-02', '2021-08-01', NULL )