根据R中跨变量的匹配数据组合进行过滤

根据R中跨变量的匹配数据组合进行过滤,r,dplyr,R,Dplyr,我有一个大的数据帧,其中包括使用melt()转换的距离矩阵中的变量对。看起来有点像这样: library(tibble) df <- tribble(~Word1, ~Word2, ~distance, ~speaker, ~session, "WordA", "WordX", 1.4, "JB", 1, "WordB", "WordY", 2.

我有一个大的数据帧,其中包括使用
melt()
转换的距离矩阵中的变量对。看起来有点像这样:

 library(tibble)

 df <- tribble(~Word1, ~Word2, ~distance, ~speaker, ~session,
          "WordA", "WordX", 1.4, "JB", 1,
          "WordB", "WordY", 2.1, "JB", 1,
          "WordC", "WordZ", 4.7, "JB", 1,
          "WordX", "WordA", 0.23, "JB", 1,
          "WordY", "WordB", 2.3, "JB", 1,
          "WordZ", "WordC", 0.51, "JB", 1)
它只复制相同的数据帧,不过滤匹配变量组合的实例


filter(!duplicated(Word1,Word2,distance,speaker,session)
,这基本上只是使R崩溃。

在此之后,您可以通过组列
筛选

df <-as.data.frame(df)
  
df$v <- sapply(seq(df[,1]),function(x)
         paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
            Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]
    
df

  Word1 Word2 distance speaker session  Group
1 WordA WordX     1.40      JB       1 Group1
2 WordX WordA     0.23      JB       1 Group1
3 WordB WordY     2.10      JB       1 Group2
4 WordY WordB     2.30      JB       1 Group2
5 WordC WordZ     4.70      JB       1 Group3
6 WordZ WordC     0.51      JB       1 Group3
df
df <-as.data.frame(df)
  
df$v <- sapply(seq(df[,1]),function(x)
         paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
            Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]
    
df

  Word1 Word2 distance speaker session  Group
1 WordA WordX     1.40      JB       1 Group1
2 WordX WordA     0.23      JB       1 Group1
3 WordB WordY     2.10      JB       1 Group2
4 WordY WordB     2.30      JB       1 Group2
5 WordC WordZ     4.70      JB       1 Group3
6 WordZ WordC     0.51      JB       1 Group3