R 拆分合并词(使用迷你字典)
我有一组词:其中一些是合并词,另一些只是简单的词。我还有一个单独的单词列表,我将使用它与我的第一个列表(作为字典)进行比较,以便“取消合并”某些单词 下面是一个例子:R 拆分合并词(使用迷你字典),r,R,我有一组词:其中一些是合并词,另一些只是简单的词。我还有一个单独的单词列表,我将使用它与我的第一个列表(作为字典)进行比较,以便“取消合并”某些单词 下面是一个例子: ListA <- c("dopamine", "andthe", "lowerswim", "other", "different") ListB <- c("do", "mine", "and", "the", "lower", "owe", "swim") ListA我认为第一步应该是从ListB构建所有组合对:
ListA <- c("dopamine", "andthe", "lowerswim", "other", "different")
ListB <- c("do", "mine", "and", "the", "lower", "owe", "swim")
ListA我认为第一步应该是从ListB
构建所有组合对:
pairings <- expand.grid(ListB, ListB)
combos <- apply(pairings, 1, function(x) paste0(x[1], x[2]))
combos
# [1] "dodo" "minedo" "anddo" "thedo" "lowerdo" "owedo" "swimdo"
# [8] "domine" "minemine" "andmine" "themine" "lowermine" "owemine" "swimmine"
# [15] "doand" "mineand" "andand" "theand" "lowerand" "oweand" "swimand"
# [22] "dothe" "minethe" "andthe" "thethe" "lowerthe" "owethe" "swimthe"
# [29] "dolower" "minelower" "andlower" "thelower" "lowerlower" "owelower" "swimlower"
# [36] "doowe" "mineowe" "andowe" "theowe" "lowerowe" "oweowe" "swimowe"
# [43] "doswim" "mineswim" "andswim" "theswim" "lowerswim" "oweswim" "swimswim"
最后,您希望拆分ListA
中与ListB
中的一对元素相匹配的单词,除非该单词已经在ListB
中。我想有很多方法可以做到这一点,但我将使用lappy
和unlist
:
newA <- unlist(lapply(seq_along(ListA), function(idx) {
if (is.na(matches[idx]) | ListA[idx] %in% ListB) {
return(ListA[idx])
} else {
return(as.vector(as.matrix(pairings[combos == matches[idx],])))
}
}))
newA
# [1] "dopamine" "and" "the" "lower" "swim" "other" "different"
newAstringr
中的一些helper函数会有帮助吗?我想其中的一些会让你很快行动起来。@hrbrmstr我不知道stringr
软件包-我现在就去调查!谢谢你的建议。这太完美了。我花了大量的时间与stringr一起工作,试图让它发挥作用,然后我回来了,你制作了这个。
newA <- unlist(lapply(seq_along(ListA), function(idx) {
if (is.na(matches[idx]) | ListA[idx] %in% ListB) {
return(ListA[idx])
} else {
return(as.vector(as.matrix(pairings[combos == matches[idx],])))
}
}))
newA
# [1] "dopamine" "and" "the" "lower" "swim" "other" "different"