R比较多个条件后，从重复的行中选择一行_R

R比较多个条件后，从重复的行中选择一行

R比较多个条件后，从重复的行中选择一行,r,R,我从大量数据中得到了这些重复记录。现在，我需要从这些重复的行中选择一行 ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187") date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06

我从大量数据中得到了这些重复记录。现在，我需要从这些重复的行中选择一行

ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")
data <- data.frame(ID,date,type,level)

foreach (i=unique(data$ID), .combine='rbind') %do% {data[data$ID==i, "date"][1] == data[data$ID==i, "date"][2])
b <- data[data$ID==i,]}

ID firsttime（例如，选择new而不是firsttime），并将choosen放入df.right
我尝试使用foreach，这只是第一步，它不适用于ID有3个重复行的情况
ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")
data <- data.frame(ID,date,type,level)

foreach (i=unique(data$ID), .combine='rbind') %do% {data[data$ID==i, "date"][1] == data[data$ID==i, "date"][2])
b <- data[data$ID==i,]}

foreach（i=unique（data$ID），.combine='rbind'）%do%{data[data$ID==i，“date”][1]==data[data$ID==i，“date”][2]）
bdplyrpackage对这类事情很有好处
使用因子，您可以指定类别的排序方式。然后，您可以为每个唯一的ID/日期对选择每个类型和级别中的第一个
library(dplyr)

ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")

type <- factor(type, levels=c("LC", "MC", "YA", "ST"))

level <- factor(level, levels=c("active", "new", "firsttime"))

data <- data.frame(ID,date,type,level)

df.right <- data %>%
  group_by(ID, date) %>%
  filter(type == sort(type)[1]) %>%
  filter(level == sort(level)[1])

库（dplyr）
ID这里的技巧是根据需要对type
和level
的级别进行排序。然后需要进行两次重复数据消除：首先，根据列ID、date、type
删除重复行；其次，根据前两列删除重复行：
type = factor(type, levels=c("ST","YA","MC","LC"))
level = factor(level, levels=c("active","new","firsttime"))
data <- data.frame(ID,date,type,level)

d = with(data, data[order(ID, date, type, level),])
e = d[-which(duplicated(d[,1:3])),]
df.right = e[-which(duplicated(e[,1:2])),]
df.right = df.right[order(as.numeric(as.character(df.right$ID)), df.right$date),]
df.right

我认为那个答案不正确output@pcantalupo它与OP的样本输出不完全匹配，但我认为OP的样本输出是不正确的，因为在第13行和第14行之间，应该保留第13行（而不是第14行）因为LC优先于MChmm，我想知道为什么事情看起来不太对劲。这非常优雅；我建议的改进是使用arrange
按类型和级别排序，然后使用top\n
拉出顶部元素；所以在group\u by
之后，只需arrange（type，level）%%top\n（1）