R 按具有附加约束的组获取最大值_R_Tibble

R 按具有附加约束的组获取最大值

R 按具有附加约束的组获取最大值,r,tibble,R,Tibble,我有一个包含4个变量的data.frame:天（日期，格式：“YYYY-MM-DD”），小时（POSIXct，格式：“YYYY-MM-DD hh:MM:ss”），部门（chr）和金额（数字）一种baseR方法：您可以使用cummax（）（cum累积maximum）来解决这个问题注意，我假设您的数据帧已排序为hours，，在您的示例中就是这样其思想是：首先split（）将数据框拆分为具有不同dates和departments的组件。然后，在每个组件内：反转相关向量，$day 使用cumma

我有一个包含4个变量的data.frame:天（日期，格式：“YYYY-MM-DD”），小时（POSIXct，格式：“YYYY-MM-DD hh:MM:ss”），部门（chr）和金额（数字）

一种
base
R方法：您可以使用
cummax（）
（cum累积maximum）来解决这个问题注意，我假设您的数据帧已排序为
hour
s，，在您的示例中就是这样
其思想是：首先
split（）
将数据框拆分为具有不同
date
s和
department
s的组件。然后，在每个组件内：

反转相关向量，
$day

使用
cummax（）

将
$max_cond
变量翻转回正确的顺序

然后，用
do.call（）
和
rbind（）将所有组件粘在一起例如： df2 <- split(df, list(df$department, df$day)) df2 <- lapply(df2, function(x) { x$max_cond <- x[order(x$hour, decreasing = T), ]$amount %>% cummax %>% sort(decreasing = T) x }) df2 <- do.call(rbind, df2) row.names(df2) <- NULL df2 ## day hour department amount max_cond ## 1 2019-08-08 2019-08-08 10:45:00 DPT1 2 3 ## 2 2019-08-08 2019-08-08 11:00:00 DPT1 3 3 ## 3 2019-08-08 2019-08-08 11:15:00 DPT1 3 3 ## 4 2019-08-08 2019-08-08 11:30:00 DPT1 2 2 ## 5 2019-08-08 2019-08-08 11:45:00 DPT1 0 2 ## 6 2019-08-08 2019-08-08 12:00:00 DPT1 0 2 ## 7 2019-08-08 2019-08-08 12:15:00 DPT1 1 2 ## 8 2019-08-08 2019-08-08 12:30:00 DPT1 2 2 ## 9 2019-08-08 2019-08-08 12:45:00 DPT1 1 1 ## 10 2019-08-08 2019-08-08 10:45:00 DPT2 3 3 ## 11 2019-08-08 2019-08-08 11:00:00 DPT2 3 3 ## 12 2019-08-08 2019-08-08 11:15:00 DPT2 3 3 ## 13 2019-08-08 2019-08-08 11:30:00 DPT2 2 3 ## 14 2019-08-08 2019-08-08 11:45:00 DPT2 2 3 ## 15 2019-08-08 2019-08-08 12:00:00 DPT2 3 3 ## 16 2019-08-08 2019-08-08 12:15:00 DPT2 0 0 ## 17 2019-08-08 2019-08-08 12:30:00 DPT2 0 0 ## 18 2019-08-08 2019-08-08 12:45:00 DPT2 0 0 df2Abase R方法：您可以使用cummax（）（cum累积maximum）来解决此问题注意，我假设您的数据帧已排序为hour s，，在您的示例中就是这样其思想是：首先split（）将数据框拆分为具有不同date s和department s的组件。然后，在每个组件内：反转相关向量，$day 使用cummax（）将$max_cond 变量翻转回正确的顺序然后，用do.call（）和rbind（）将所有组件粘在一起例如： df2 <- split(df, list(df$department, df$day)) df2 <- lapply(df2, function(x) { x$max_cond <- x[order(x$hour, decreasing = T), ]$amount %>% cummax %>% sort(decreasing = T) x }) df2 <- do.call(rbind, df2) row.names(df2) <- NULL df2 ## day hour department amount max_cond ## 1 2019-08-08 2019-08-08 10:45:00 DPT1 2 3 ## 2 2019-08-08 2019-08-08 11:00:00 DPT1 3 3 ## 3 2019-08-08 2019-08-08 11:15:00 DPT1 3 3 ## 4 2019-08-08 2019-08-08 11:30:00 DPT1 2 2 ## 5 2019-08-08 2019-08-08 11:45:00 DPT1 0 2 ## 6 2019-08-08 2019-08-08 12:00:00 DPT1 0 2 ## 7 2019-08-08 2019-08-08 12:15:00 DPT1 1 2 ## 8 2019-08-08 2019-08-08 12:30:00 DPT1 2 2 ## 9 2019-08-08 2019-08-08 12:45:00 DPT1 1 1 ## 10 2019-08-08 2019-08-08 10:45:00 DPT2 3 3 ## 11 2019-08-08 2019-08-08 11:00:00 DPT2 3 3 ## 12 2019-08-08 2019-08-08 11:15:00 DPT2 3 3 ## 13 2019-08-08 2019-08-08 11:30:00 DPT2 2 3 ## 14 2019-08-08 2019-08-08 11:45:00 DPT2 2 3 ## 15 2019-08-08 2019-08-08 12:00:00 DPT2 3 3 ## 16 2019-08-08 2019-08-08 12:15:00 DPT2 0 0 ## 17 2019-08-08 2019-08-08 12:30:00 DPT2 0 0 ## 18 2019-08-08 2019-08-08 12:45:00 DPT2 0 0 df2非常相似，但使用数据表您可以： library(data.table) df <- structure(list( day = structure(c(18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116), class = "Date"), hour = structure(c(1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700, 1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700), class = c("POSIXct", "POSIXt"), tzone = ""), department = c("DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2"), amount = c(2, 3, 3, 2, 0, 0, 1, 2, 1, 3, 3, 3, 2, 2, 3, 0, 0, 0), max_cond = c(3, 3, 3, 2, 2, 2, 2, 2, 1, 3, 3, 3, 3, 3, 3, 0, 0, 0)), row.names = c(NA, -18L), class = "data.frame") dt = data.table(df) setorder(dt, -hour) dt[,max_cond_new:=cummax(amount),by=.(day,department)] setorder(dt, department, hour) 库（data.table） df非常相似，但使用数据。表可以执行以下操作： library(data.table) df <- structure(list( day = structure(c(18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116, 18116), class = "Date"), hour = structure(c(1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700, 1565275500, 1565276400, 1565277300, 1565278200, 1565279100, 1565280000, 1565280900, 1565281800, 1565282700), class = c("POSIXct", "POSIXt"), tzone = ""), department = c("DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT1", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2", "DPT2"), amount = c(2, 3, 3, 2, 0, 0, 1, 2, 1, 3, 3, 3, 2, 2, 3, 0, 0, 0), max_cond = c(3, 3, 3, 2, 2, 2, 2, 2, 1, 3, 3, 3, 3, 3, 3, 0, 0, 0)), row.names = c(NA, -18L), class = "data.frame") dt = data.table(df) setorder(dt, -hour) dt[,max_cond_new:=cummax(amount),by=.(day,department)] setorder(dt, department, hour) 库（data.table）欢迎来到这个网站，你能告诉我什么失败了，怎么失败的吗？如何定义hour>=hour\u i）参考小时是什么？参考hour\u i 是第i行变量hour的值。我习惯于使用dplyr:：来计算组内的汇总统计信息，但是额外的约束hour>=hour\u i 使它更复杂。如果我们在第1行（i==1），那么hour_i==11:45:00 ，那么我们是否检查11:45>11:45 ？看来我要么是误解了，要么你真的应该做一个普通的过滤器？这是正确的。我只想计算hour>=hour\u I 的观测子集的“金额”的最大值（且在同一组日期和观测部门“I”内）。假设我们在第4行（i＝4）。然后我希望“max_cond”是max_cond_4=max（2,0,0,1,2,1）=2。使用for循环加上通用过滤器可能会做到这一点，但我正在寻找一种更优雅（希望更快）的方法。data.table能做到吗？欢迎来到这个网站，你能告诉我们哪些失败了，以及如何失败吗？如何定义hour>=hour\u i）参考小时是什么？参考hour\u i 是第i行变量hour的值。我习惯于使用dplyr:：来计算组内的汇总统计信息，但是额外的约束hour>=hour\u i 使它更复杂。如果我们在第1行（i==1），那么hour_i==11:45:00 ，那么我们是否检查11:45>11:45 ？看来我要么是误解了，要么你真的应该做一个普通的过滤器？这是正确的。我只想计算hour>=hour\u I 的观测子集的“金额”的最大值（且在同一组日期和观测部门“I”内）。假设我们在第4行（i＝4）。然后我希望“max_cond”是max_cond_4=max（2,0,0,1,2,1）=2。使用for循环加上通用过滤器可能会做到这一点，但我正在寻找一种更优雅（希望更快）的方法。data.table可以做到吗？谢谢，但不幸的是，我的data.frame没有拆分为单个（日/部门）片段=/。对不起，如果我的例子不清楚。我有一个非常大的data.frame，所以解决问题的“分组部分”也是至关重要的…明白了。那么，您能更新您的MWE以包含该行为吗？（不幸的是，我不得不离开电脑几个小时，所以我暂时无法修改帖子。不过，其他人可能会提供更好的答案。）好的。更新了MWE。更新了答案，尽管我认为Patrick Altmeyer的data.table 解决方案可能更好。谢谢，但不幸的是，我的data.frame没有拆分为单个（日/部门）片段=/。对不起，如果我的例子不清楚。我有一个非常大的data.frame，所以解决问题的“分组部分”也是至关重要的…明白了。那么，您能更新您的MWE以包含该行为吗？（不幸的是，我不得不离开电脑几个小时，所以我暂时无法修改帖子。不过，其他人可能会提供更好的答案。）好的。更新了MWE。更新了答案，尽管我认为Patrick Altmeyer的数据。table 解决方案可能更好。效果非常好！非常感谢你！太好了，没问题@pabc！无法推荐数据。表足够多，尤其是对于大型数据集。如果这解决了你的问题，你介意接受答案吗？谢谢你！非常感谢你！太好了，没问题@pabc！无法推荐数据。表足够多，尤其是对于大型数据集。如果这解决了你的问题，你介意接受答案吗？谢谢