根据R中跨变量的匹配数据组合进行过滤
我有一个大的数据帧,其中包括使用根据R中跨变量的匹配数据组合进行过滤,r,dplyr,R,Dplyr,我有一个大的数据帧,其中包括使用melt()转换的距离矩阵中的变量对。看起来有点像这样: library(tibble) df <- tribble(~Word1, ~Word2, ~distance, ~speaker, ~session, "WordA", "WordX", 1.4, "JB", 1, "WordB", "WordY", 2.
melt()
转换的距离矩阵中的变量对。看起来有点像这样:
library(tibble)
df <- tribble(~Word1, ~Word2, ~distance, ~speaker, ~session,
"WordA", "WordX", 1.4, "JB", 1,
"WordB", "WordY", 2.1, "JB", 1,
"WordC", "WordZ", 4.7, "JB", 1,
"WordX", "WordA", 0.23, "JB", 1,
"WordY", "WordB", 2.3, "JB", 1,
"WordZ", "WordC", 0.51, "JB", 1)
它只复制相同的数据帧,不过滤匹配变量组合的实例
和
filter(!duplicated(Word1,Word2,distance,speaker,session)
,这基本上只是使R崩溃。在此之后,您可以通过组列筛选
df <-as.data.frame(df)
df$v <- sapply(seq(df[,1]),function(x)
paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]
df
Word1 Word2 distance speaker session Group
1 WordA WordX 1.40 JB 1 Group1
2 WordX WordA 0.23 JB 1 Group1
3 WordB WordY 2.10 JB 1 Group2
4 WordY WordB 2.30 JB 1 Group2
5 WordC WordZ 4.70 JB 1 Group3
6 WordZ WordC 0.51 JB 1 Group3
df
df <-as.data.frame(df)
df$v <- sapply(seq(df[,1]),function(x)
paste(sort(c(df[x,1],df[x,2])),collapse=""))
l <- data.frame(v=unique(df$v),
Group=paste0("Group",seq(unique(df$v))))
df <- merge(df,l,by="v")[,-1]
df
Word1 Word2 distance speaker session Group
1 WordA WordX 1.40 JB 1 Group1
2 WordX WordA 0.23 JB 1 Group1
3 WordB WordY 2.10 JB 1 Group2
4 WordY WordB 2.30 JB 1 Group2
5 WordC WordZ 4.70 JB 1 Group3
6 WordZ WordC 0.51 JB 1 Group3