R 从具有成对比较和多个条件的数据帧中删除观测值

R 从具有成对比较和多个条件的数据帧中删除观测值,r,R,我需要有选择地删除基于两个变量与数据集其余部分的两两比较的观察结果 具体来说,这些都是成本效益数据,我想放弃“主导”干预措施,因为存在替代方案 1.更昂贵 2.低效 我的例子是: Township <- c(rep('A',3), rep('B',3)) Intervention <- rep(1:3, 2) Cost <- c(1000, 500, 3000, 900, 1200, 1500) Effect <- c(10, 8, 30, 10, 7, 8) Res

我需要有选择地删除基于两个变量与数据集其余部分的两两比较的观察结果

具体来说,这些都是成本效益数据,我想放弃“主导”干预措施,因为存在替代方案 1.更昂贵 2.低效

我的例子是:

Township <- c(rep('A',3), rep('B',3))
Intervention <- rep(1:3, 2)
Cost <- c(1000, 500, 3000, 900, 1200, 1500)
Effect <- c(10, 8, 30, 10, 7, 8)  
Res <- data_frame(Township, Intervention, Cost, Effect)

town有一种更好的方法使用data.table包,它既简单易读(我指的是代码),又比其他方法更快、更具可扩展性

require(data.table)
Res.dt <- data.table(Res)
# figure out which intervention was least costly, within Township
Res.dt[, minCost := min(Cost), by = Township]
# get the Effect of the minimum cost intervention, within Township
Res.dt[, minCostEffect := Effect[Cost == minCost], by = Township][]
##    Township Intervention Cost Effect minCost minCostEffect
## 1:        A            1 1000     10     500             8
## 2:        A            2  500      8     500             8
## 3:        A            3 3000     30     500             8
## 4:        B            1  900     10     900            10
## 5:        B            2 1200      7     900            10
## 6:        B            3 1500      8     900            10
# select out the dominated observations
Res.dt[!(Cost > minCost & Effect < minCostEffect)][]
##    Township Intervention Cost Effect minCost minCostEffect
## 1:        A            1 1000     10     500             8
## 2:        A            2  500      8     500             8
## 3:        A            3 3000     30     500             8
## 4:        B            1  900     10     900            10

非常好的一步一步的解释(即使逻辑可以简化为一条语句)。@Arun谢谢!minCost和minCostEffect创作可以合并成一个声明吗?非常感谢@Ken的回复。实际上,我需要考虑对所有观察的比较,而不仅仅是以最小的成本进行观察。如果存在成本较低且影响较大的其他观察,则干预占主导地位。一项干预的影响可能大于minCostEffect值,但仍由另一项干预占主导地位。上述干预措施2以干预措施3为主,这不是最便宜的。很抱歉,多条评论和糟糕的格式,我已经习惯stackoverflow了
Res.need <- Res %>% group_by(Township) %>% arrange(Cost)
Res.need <- Res.need[-c(5,6),] 
require(data.table)
Res.dt <- data.table(Res)
# figure out which intervention was least costly, within Township
Res.dt[, minCost := min(Cost), by = Township]
# get the Effect of the minimum cost intervention, within Township
Res.dt[, minCostEffect := Effect[Cost == minCost], by = Township][]
##    Township Intervention Cost Effect minCost minCostEffect
## 1:        A            1 1000     10     500             8
## 2:        A            2  500      8     500             8
## 3:        A            3 3000     30     500             8
## 4:        B            1  900     10     900            10
## 5:        B            2 1200      7     900            10
## 6:        B            3 1500      8     900            10
# select out the dominated observations
Res.dt[!(Cost > minCost & Effect < minCostEffect)][]
##    Township Intervention Cost Effect minCost minCostEffect
## 1:        A            1 1000     10     500             8
## 2:        A            2  500      8     500             8
## 3:        A            3 3000     30     500             8
## 4:        B            1  900     10     900            10
# data with new examples
Res <- data.frame(Township = c(rep('A',3), rep('B',3), rep('C',3)),
                  Intervention = rep(1:3, 3), 
                  Cost = c(1000, 500, 3000, 900, 1200, 1500, 500, 600, 550),
                  Effect = c(10, 8, 30, 10, 7, 8, 5, 10, 11))
require(data.table)
Res.dt <- data.table(Res)

# function to find the dominated observations
findDominated <- function(data) {
    data.split <- split(Res.dt, Res.dt[, Township])
    dominated <- lapply(data.split, function(Res.subset) {
        domSplit <- logical(nrow(Res.subset))
        for (i in 1:nrow(Res.subset))
            domSplit[i] <- any(Res.subset$Cost[i] > Res.subset$Cost & Res.subset$Effect[i] < Res.subset[["Effect"]])
        domSplit
    })
    unlist(dominated, use.names = FALSE)
}
Res.dt[, dominated := findDominated(Res.dt)][]
##    Township Intervention Cost Effect dominated
## 1:        A            1 1000     10     FALSE
## 3:        A            3 3000     30     FALSE
## 2:        A            2  500      8     FALSE
## 4:        B            1  900     10     FALSE
## 5:        B            2 1200      7      TRUE
## 6:        B            3 1500      8      TRUE
## 7:        C            1  500      5     FALSE
## 8:        C            2  600     10      TRUE
## 9:        C            3  550     11     FALSE

# sort by cost in each Township
setorder(Res.dt, Township, Cost)
# show non-dominated results
Res.dt[dominated == FALSE]
##    Township Intervention Cost Effect dominated
## 1:        A            2  500      8     FALSE
## 2:        A            1 1000     10     FALSE
## 3:        A            3 3000     30     FALSE
## 5:        C            1  500      5     FALSE
## 4:        B            1  900     10     FALSE
## 6:        C            3  550     11     FALSE