R 从具有成对比较和多个条件的数据帧中删除观测值
我需要有选择地删除基于两个变量与数据集其余部分的两两比较的观察结果 具体来说,这些都是成本效益数据,我想放弃“主导”干预措施,因为存在替代方案 1.更昂贵 2.低效 我的例子是:R 从具有成对比较和多个条件的数据帧中删除观测值,r,R,我需要有选择地删除基于两个变量与数据集其余部分的两两比较的观察结果 具体来说,这些都是成本效益数据,我想放弃“主导”干预措施,因为存在替代方案 1.更昂贵 2.低效 我的例子是: Township <- c(rep('A',3), rep('B',3)) Intervention <- rep(1:3, 2) Cost <- c(1000, 500, 3000, 900, 1200, 1500) Effect <- c(10, 8, 30, 10, 7, 8) Res
Township <- c(rep('A',3), rep('B',3))
Intervention <- rep(1:3, 2)
Cost <- c(1000, 500, 3000, 900, 1200, 1500)
Effect <- c(10, 8, 30, 10, 7, 8)
Res <- data_frame(Township, Intervention, Cost, Effect)
town有一种更好的方法使用data.table包,它既简单易读(我指的是代码),又比其他方法更快、更具可扩展性
require(data.table)
Res.dt <- data.table(Res)
# figure out which intervention was least costly, within Township
Res.dt[, minCost := min(Cost), by = Township]
# get the Effect of the minimum cost intervention, within Township
Res.dt[, minCostEffect := Effect[Cost == minCost], by = Township][]
## Township Intervention Cost Effect minCost minCostEffect
## 1: A 1 1000 10 500 8
## 2: A 2 500 8 500 8
## 3: A 3 3000 30 500 8
## 4: B 1 900 10 900 10
## 5: B 2 1200 7 900 10
## 6: B 3 1500 8 900 10
# select out the dominated observations
Res.dt[!(Cost > minCost & Effect < minCostEffect)][]
## Township Intervention Cost Effect minCost minCostEffect
## 1: A 1 1000 10 500 8
## 2: A 2 500 8 500 8
## 3: A 3 3000 30 500 8
## 4: B 1 900 10 900 10
非常好的一步一步的解释(即使逻辑可以简化为一条语句)。@Arun谢谢!minCost和minCostEffect创作可以合并成一个声明吗?非常感谢@Ken的回复。实际上,我需要考虑对所有观察的比较,而不仅仅是以最小的成本进行观察。如果存在成本较低且影响较大的其他观察,则干预占主导地位。一项干预的影响可能大于minCostEffect值,但仍由另一项干预占主导地位。上述干预措施2以干预措施3为主,这不是最便宜的。很抱歉,多条评论和糟糕的格式,我已经习惯stackoverflow了
Res.need <- Res %>% group_by(Township) %>% arrange(Cost)
Res.need <- Res.need[-c(5,6),]
require(data.table)
Res.dt <- data.table(Res)
# figure out which intervention was least costly, within Township
Res.dt[, minCost := min(Cost), by = Township]
# get the Effect of the minimum cost intervention, within Township
Res.dt[, minCostEffect := Effect[Cost == minCost], by = Township][]
## Township Intervention Cost Effect minCost minCostEffect
## 1: A 1 1000 10 500 8
## 2: A 2 500 8 500 8
## 3: A 3 3000 30 500 8
## 4: B 1 900 10 900 10
## 5: B 2 1200 7 900 10
## 6: B 3 1500 8 900 10
# select out the dominated observations
Res.dt[!(Cost > minCost & Effect < minCostEffect)][]
## Township Intervention Cost Effect minCost minCostEffect
## 1: A 1 1000 10 500 8
## 2: A 2 500 8 500 8
## 3: A 3 3000 30 500 8
## 4: B 1 900 10 900 10
# data with new examples
Res <- data.frame(Township = c(rep('A',3), rep('B',3), rep('C',3)),
Intervention = rep(1:3, 3),
Cost = c(1000, 500, 3000, 900, 1200, 1500, 500, 600, 550),
Effect = c(10, 8, 30, 10, 7, 8, 5, 10, 11))
require(data.table)
Res.dt <- data.table(Res)
# function to find the dominated observations
findDominated <- function(data) {
data.split <- split(Res.dt, Res.dt[, Township])
dominated <- lapply(data.split, function(Res.subset) {
domSplit <- logical(nrow(Res.subset))
for (i in 1:nrow(Res.subset))
domSplit[i] <- any(Res.subset$Cost[i] > Res.subset$Cost & Res.subset$Effect[i] < Res.subset[["Effect"]])
domSplit
})
unlist(dominated, use.names = FALSE)
}
Res.dt[, dominated := findDominated(Res.dt)][]
## Township Intervention Cost Effect dominated
## 1: A 1 1000 10 FALSE
## 3: A 3 3000 30 FALSE
## 2: A 2 500 8 FALSE
## 4: B 1 900 10 FALSE
## 5: B 2 1200 7 TRUE
## 6: B 3 1500 8 TRUE
## 7: C 1 500 5 FALSE
## 8: C 2 600 10 TRUE
## 9: C 3 550 11 FALSE
# sort by cost in each Township
setorder(Res.dt, Township, Cost)
# show non-dominated results
Res.dt[dominated == FALSE]
## Township Intervention Cost Effect dominated
## 1: A 2 500 8 FALSE
## 2: A 1 1000 10 FALSE
## 3: A 3 3000 30 FALSE
## 5: C 1 500 5 FALSE
## 4: B 1 900 10 FALSE
## 6: C 3 550 11 FALSE