在R中查找冠词的组合

在R中查找冠词的组合,r,dataframe,combinations,permutation,R,Dataframe,Combinations,Permutation,我想按文章编号和年份找出“单词”的不同组合。有什么想法吗 我的数据集如下所示: Year Article Word 2013 Article1 WordA 2013 Article1 WordB 2013 Article2 WordC 2013 Article2 WordD 2013 Article2 WordA 2014 Article1 WordC 2014 Article1 WordA 20

我想按文章编号和年份找出“单词”的不同组合。有什么想法吗

我的数据集如下所示:

Year     Article     Word
2013    Article1    WordA
2013    Article1    WordB
2013    Article2    WordC
2013    Article2    WordD
2013    Article2    WordA
2014    Article1    WordC
2014    Article1    WordA
2014    Article4    WordE
2014    Article4    WordD
2014    Article4    WordB
我希望结果是这样的:

Year    Article    Source   Target
2013    Article1    WordA   WordB
2013    Article1    WordB   WordA
2013    Article2    WordC   WordD
2013    Article2    WordC   WordA
2013    Article2    WordD   WordC
2013    Article2    WordD   WordA
2013    Article2    WordA   WordC
2013    Article2    WordA   WordD
2014    Article1    WordC   WordA
2014    Article1    WordA   WordC
2014    Article4    WordE   WordD
2014    Article4    WordE   WordB
2014    Article4    WordD   WordE
2014    Article4    WordD   WordB
2014    Article4    WordB   WordE
2014    Article4    WordB   WordD

谢谢

您可以尝试
合并
,然后
子集
具有不同“Word”列的行

df2 <- merge(df1, df1, by.x=c('Year', 'Article'), by.y= c('Year', 'Article'))
res <- subset(df2, Word.x!=Word.y)
row.names(res) <- NULL
res
# Year  Article Word.x Word.y
#1  2013 Article1  WordA  WordB
#2  2013 Article1  WordB  WordA
#3  2013 Article2  WordC  WordD
#4  2013 Article2  WordC  WordA
#5  2013 Article2  WordD  WordC
#6  2013 Article2  WordD  WordA
#7  2013 Article2  WordA  WordC
#8  2013 Article2  WordA  WordD
#9  2014 Article1  WordC  WordA
#10 2014 Article1  WordA  WordC
#11 2014 Article4  WordE  WordD
#12 2014 Article4  WordE  WordB
#13 2014 Article4  WordD  WordE
#14 2014 Article4  WordD  WordB
#15 2014 Article4  WordB  WordE
#16 2014 Article4  WordB  WordD
注意:安装data.table的devel版本的说明如下

数据
df1
library(data.table)#v1.9.5
setDT(df1)[df1, on= c('Year', 'Article'), allow.cartesian=TRUE][Word!=i.Word]
#    Year  Article  Word i.Word
# 1: 2013 Article1 WordB  WordA
# 2: 2013 Article1 WordA  WordB
# 3: 2013 Article2 WordD  WordC
# 4: 2013 Article2 WordA  WordC
# 5: 2013 Article2 WordC  WordD
# 6: 2013 Article2 WordA  WordD
# 7: 2013 Article2 WordC  WordA
# 8: 2013 Article2 WordD  WordA
# 9: 2014 Article1 WordA  WordC
#10: 2014 Article1 WordC  WordA
#11: 2014 Article4 WordD  WordE
#12: 2014 Article4 WordB  WordE
#13: 2014 Article4 WordE  WordD
#14: 2014 Article4 WordB  WordD
#15: 2014 Article4 WordE  WordB
#16: 2014 Article4 WordD  WordB
df1 <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L), Article = c("Article1", "Article1", 
"Article2", "Article2", "Article2", "Article1", "Article1", "Article4", 
"Article4", "Article4"), Word = c("WordA", "WordB", "WordC", 
"WordD", "WordA", "WordC", "WordA", "WordE", "WordD", "WordB"
)), .Names = c("Year", "Article", "Word"), class = "data.frame", 
row.names = c(NA, -10L))