查找重复项，比较条件，删除一行r_R_If Statement_Duplicates

查找重复项，比较条件，删除一行r

r if-statement

查找重复项，比较条件，删除一行r,r,if-statement,duplicates,R,If Statement,Duplicates,使用以下可复制示例： ID1<-c("a1","a4","a6","a6","a5", "a1" ) ID2<-c("b8","b99","b5","b5","b2","b8" ) Value1<-c(2,5,6,6,2,7) Value2<- c(23,51,63,64,23,23) Year<- c(2004,2004,2004,2004,2005,2004) df<-data.frame(ID1,ID2,Value1,Value2,Year) 我尝试了

使用以下可复制示例：

ID1<-c("a1","a4","a6","a6","a5", "a1" )
ID2<-c("b8","b99","b5","b5","b2","b8" )
Value1<-c(2,5,6,6,2,7)
Value2<- c(23,51,63,64,23,23)
Year<- c(2004,2004,2004,2004,2005,2004)
df<-data.frame(ID1,ID2,Value1,Value2,Year)

我尝试了以下方法：查找我感兴趣的条件的唯一标识符

df$new<-paste(df$ID1,df$ID2, df$Year, sep="_")

df$new考虑aggregate
按分组ID1、ID2和年份检索最大值：
df_new一些不同的可能性。使用dplyr
：
df %>%
  group_by(ID1, ID2, Year) %>%
  filter(Value1 == max(Value1) & Value2 == max(Value2))

或：
使用数据。表：
setDT(df)[df[, .I[Value1 == max(Value1) & Value2 == max(Value2)], by = list(ID1, ID2, Year)]$V1]

或：
或：
不加载库的解决方案：
            ID1 ID2 Value1 Value2 Year
a6.b5.2004   a6  b5      6     64 2004
a1.b8.2004   a1  b8      7     23 2004
a4.b99.2004  a4 b99      5     51 2004
a5.b2.2005   a5  b2      2     23 2005

代码
哇，非常优雅的解决方案！我希望不要忘记这一点！！只有一个警告，它删除了带有“NA”的行，因此我添加了“NA.action=NA.pass”。但是一个问题仍然存在，它删除行只是一个ID“Na'i发布了另一个问题来解释问题的一个例子很好的方法集合，所有这些只是擦除NAs……如果我想考虑它们呢？在你的数据中没有NAs，您的意思是保留不符合您标准的案例吗？我没有将NAs放在示例中，但我的真实数据中的所有列中都有NAs（不在年份中）。对于值列，您可以使用max（Value，na.rm=TRUE），也可以将NAs替换为0（或其他选择值）。分配ID不是一个计算问题，而是一个概念问题。我认为在这种情况下，它也是一个计算问题。如果我将NAs更改为ID中的字符串，那么在一年中，当其中一个ID有NA和真实环时，它会重复。如果我保留NAs，它会在其中一个ID为NA时删除环。
df_new <- aggregate(.~ID1 + ID2 + Year, df, max)
df_new

#   ID1 ID2 Year Value1 Value2
# 1  a6  b5 2004      6     64
# 2  a1  b8 2004      7     23
# 3  a4 b99 2004      5     51
# 4  a5  b2 2005      2     23

df %>%
  group_by(ID1, ID2, Year) %>%
  filter(Value1 == max(Value1) & Value2 == max(Value2))

df %>%
  rowwise() %>%
  mutate(max_val = sum(Value1, Value2)) %>%
  ungroup() %>%
  group_by(ID1, ID2, Year) %>%
  filter(max_val == max(max_val)) %>%
  select(-max_val)

setDT(df)[df[, .I[Value1 == max(Value1) & Value2 == max(Value2)], by = list(ID1, ID2, Year)]$V1]

setDT(df)[, max_val := sum(Value1, Value2), by = 1:nrow(df)
   ][, filter := max_val == max(max_val), by = list(ID1, ID2, Year)
       ][filter != FALSE
         ][, -c("max_val", "filter")]

subset(setDT(df)[, max_val := sum(Value1, Value2), by = 1:nrow(df)
             ][, filter := max_val == max(max_val), by = list(ID1, ID2, Year)], filter != FALSE)[, -c("max_val", "filter")]

            ID1 ID2 Value1 Value2 Year
a6.b5.2004   a6  b5      6     64 2004
a1.b8.2004   a1  b8      7     23 2004
a4.b99.2004  a4 b99      5     51 2004
a5.b2.2005   a5  b2      2     23 2005

do.call(rbind, lapply(split(df, list(df$ID1, df$ID2, df$Year)),                  # make identifiers
                      function(x) {return(x[which.max(x$Value1 + x$Value2),])})) # take max of sum