Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/azure/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 是否删除组内的重复项?_R - Fatal编程技术网

R 是否删除组内的重复项?

R 是否删除组内的重复项?,r,R,代码示例数据: mydf<-data.frame(Group_ID=c("337", "337", "201", "201", "470", "470", "999", "999"), Timestamp=c("A", "A", "B", "B", "C", "D", "E", "F"), MU=c("1", "1", "2", "3", "4", "4", "5", "6")) 在“Group_ID”中,我只想保留“Tim

代码示例数据:

    mydf<-data.frame(Group_ID=c("337", "337", "201", "201", "470", "470", "999", "999"), 
             Timestamp=c("A", "A", "B", "B", "C", "D", "E", "F"), 
             MU=c("1", "1", "2", "3", "4", "4", "5", "6"))
在“Group_ID”中,我只想保留“Timestamp”和“MU”都不重复的条目。因此,在此示例中,仅保留第7行和第8行()(“Group_ID”999对“Timestamp”和“MU”都有唯一的条目)

我的一些尝试:

mydf<-mydf %>%
  group_by(Group_ID) %>%
  filter(unique(Timestamp))
返回错误:

逻辑索引向量的长度必须为1或3(列数),而不是8

(同样,我会使用MU再次运行代码)


我看过类似的问题,但没有找到一个与相同的场景。非常感谢

如果我们使用
过滤器
,它需要一个逻辑向量。
unique
的输出只是该列(
character
类)的唯一元素。所以这是行不通的。我们可以使用
duplicated
获得重复元素的逻辑向量,求反(
)。因此,TRUE->FALSE和viceversa只能获取第一个非重复元素

library(dplyr)
mydf %>% 
   group_by(Group_ID) %>% 
   filter(!(duplicated(Timestamp, fromLast = TRUE)| duplicated(Timestamp))) 

或者根据行数同时按“组ID”、“时间戳”和
过滤器进行分组

mydf %>%
   group_by(Group_ID, Timestamp) %>%
   filter(n() == 1)
如果我们只需要“999”“组ID”

mydf %>% 
  group_by(Group_ID) %>%
  filter_at(vars(Timestamp,  MU),  all_vars(n_distinct(.) == n()))
# A tibble: 2 x 3
# Groups:   Group_ID [1]
#  Group_ID Timestamp MU   
#  <fct>    <fct>     <fct>
#1 999      E         5    
#2 999      F         6    

以下是一个基本解决方案:

is.unique Group\u ID时间戳MU
#>7999 E 5
#>899F6

是否有
base
的可能性来实现这一点?@NelsonGon只需在感兴趣的“GroupID”、“Timestamp”列上使用
duplicated
。不清楚预期的输出though@akrun非常感谢。您的解决方案返回具有重复时间戳的组中的第一个条目。是否可以对其进行修改以排除时间戳重复的所有行?(所以只剩下组ID 470和999)?@Emily你需要
mydf%>%groupby(Group\u ID)%%>%filter(!(duplicated(Timestamp,fromLast=TRUE)| duplicated(Timestamp)(Timestamp))
@NelsonGon类似于
mydf[!(duplicated(mydf[1:2])duplicated(mydf[1:2],fromLast=TRUE)),]
mydf %>%
   group_by(Group_ID, Timestamp) %>%
   filter(n() == 1)
mydf %>% 
  group_by(Group_ID) %>%
  filter_at(vars(Timestamp,  MU),  all_vars(n_distinct(.) == n()))
# A tibble: 2 x 3
# Groups:   Group_ID [1]
#  Group_ID Timestamp MU   
#  <fct>    <fct>     <fct>
#1 999      E         5    
#2 999      F         6    
distinct(mydf, Group_ID, Timestamp, .keep_all = TRUE)
foo = function(x, f){
    ave(as.numeric(as.factor(x)),
        f,
        FUN = function(y) length(unique(y)) == length(y))
}

inds = Reduce("&", lapply(mydf[c("Timestamp", "MU")],
                          function(x) foo(x, mydf$Group_ID) == 1))

mydf[inds,]
#  Group_ID Timestamp MU
#7      999         E  5
#8      999         F  6