在R中查找冠词的组合
我想按文章编号和年份找出“单词”的不同组合。有什么想法吗 我的数据集如下所示:在R中查找冠词的组合,r,dataframe,combinations,permutation,R,Dataframe,Combinations,Permutation,我想按文章编号和年份找出“单词”的不同组合。有什么想法吗 我的数据集如下所示: Year Article Word 2013 Article1 WordA 2013 Article1 WordB 2013 Article2 WordC 2013 Article2 WordD 2013 Article2 WordA 2014 Article1 WordC 2014 Article1 WordA 20
Year Article Word
2013 Article1 WordA
2013 Article1 WordB
2013 Article2 WordC
2013 Article2 WordD
2013 Article2 WordA
2014 Article1 WordC
2014 Article1 WordA
2014 Article4 WordE
2014 Article4 WordD
2014 Article4 WordB
我希望结果是这样的:
Year Article Source Target
2013 Article1 WordA WordB
2013 Article1 WordB WordA
2013 Article2 WordC WordD
2013 Article2 WordC WordA
2013 Article2 WordD WordC
2013 Article2 WordD WordA
2013 Article2 WordA WordC
2013 Article2 WordA WordD
2014 Article1 WordC WordA
2014 Article1 WordA WordC
2014 Article4 WordE WordD
2014 Article4 WordE WordB
2014 Article4 WordD WordE
2014 Article4 WordD WordB
2014 Article4 WordB WordE
2014 Article4 WordB WordD
谢谢 您可以尝试
合并
,然后子集
具有不同“Word”列的行
df2 <- merge(df1, df1, by.x=c('Year', 'Article'), by.y= c('Year', 'Article'))
res <- subset(df2, Word.x!=Word.y)
row.names(res) <- NULL
res
# Year Article Word.x Word.y
#1 2013 Article1 WordA WordB
#2 2013 Article1 WordB WordA
#3 2013 Article2 WordC WordD
#4 2013 Article2 WordC WordA
#5 2013 Article2 WordD WordC
#6 2013 Article2 WordD WordA
#7 2013 Article2 WordA WordC
#8 2013 Article2 WordA WordD
#9 2014 Article1 WordC WordA
#10 2014 Article1 WordA WordC
#11 2014 Article4 WordE WordD
#12 2014 Article4 WordE WordB
#13 2014 Article4 WordD WordE
#14 2014 Article4 WordD WordB
#15 2014 Article4 WordB WordE
#16 2014 Article4 WordB WordD
注意:安装data.table的devel版本的说明如下
数据
df1
library(data.table)#v1.9.5
setDT(df1)[df1, on= c('Year', 'Article'), allow.cartesian=TRUE][Word!=i.Word]
# Year Article Word i.Word
# 1: 2013 Article1 WordB WordA
# 2: 2013 Article1 WordA WordB
# 3: 2013 Article2 WordD WordC
# 4: 2013 Article2 WordA WordC
# 5: 2013 Article2 WordC WordD
# 6: 2013 Article2 WordA WordD
# 7: 2013 Article2 WordC WordA
# 8: 2013 Article2 WordD WordA
# 9: 2014 Article1 WordA WordC
#10: 2014 Article1 WordC WordA
#11: 2014 Article4 WordD WordE
#12: 2014 Article4 WordB WordE
#13: 2014 Article4 WordE WordD
#14: 2014 Article4 WordB WordD
#15: 2014 Article4 WordE WordB
#16: 2014 Article4 WordD WordB
df1 <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L), Article = c("Article1", "Article1",
"Article2", "Article2", "Article2", "Article1", "Article1", "Article4",
"Article4", "Article4"), Word = c("WordA", "WordB", "WordC",
"WordD", "WordA", "WordC", "WordA", "WordE", "WordD", "WordB"
)), .Names = c("Year", "Article", "Word"), class = "data.frame",
row.names = c(NA, -10L))