R 如何比较一列中的两个相邻字符串并遍历所有字符串?
我用于查找两个字符串之间的差异()的函数: 错误:R 如何比较一列中的两个相邻字符串并遍历所有字符串?,r,loops,dataframe,lapply,R,Loops,Dataframe,Lapply,我用于查找两个字符串之间的差异()的函数: 错误: Error in match.fun(FUN) : 'diff(a, b)' is not a function, character or symbol 所以我想知道我该怎么做?非常感谢 我不能完全确定我是否理解这个问题。如果您正试图查找列/变量中存在的差异?你可以这样做 将列转换为字符向量 我在这里获取了您的前17个条目,并手动将它们放入向量“x”中 x<-c("45CCBC44B", "45CCBC44B", "45CC
Error in match.fun(FUN) :
'diff(a, b)' is not a function, character or symbol
所以我想知道我该怎么做?非常感谢 我不能完全确定我是否理解这个问题。如果您正试图查找列/变量中存在的差异?你可以这样做
- 将列转换为字符向量
x<-c("45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "<5CCBC:4B", "<5CCBC:4B", "<5CCBC:4B", "<<CCBC::B", "<<GGBG::E", "<<GGBG::E", "55CCBC41B", "55CCBC41B")
x比我下面的答案更好,只需按照@Andrie在评论中建议的diff(grint[-1],grint[-length(grint)])
这里有两种稍微不同的方法,它们可以处理不同长度的字符串。如果所有字符串的长度相同,则不需要使用stru-pad
fromstringr
samplestrings <- c("apple", "apple", "banana", "banana", "apple", "apple","aslkd;fa")
library(stringr)
samplestrings <- str_pad(samplestrings, max(nchar(samplestrings)) , side="right")
X0 <- unlist(strsplit(samplestrings,split="")) ## Nasty but necessary!
Y0 <- unlist(strsplit(c(samplestrings[-1], rep(" ", max(nchar(samplestrings)))),split="")) ## ...
ix <- which(X0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)] !=
Y0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)])
cbind(ix,X0[ix],Y0[ix])
ix
[1,] "9" "a" "b"
[2,] "10" "p" "a"
[3,] "11" "p" "n"
[4,] "12" "l" "a"
[5,] "13" "e" "n"
[6,] "14" " " "a"
[7,] "25" "b" "a"
[8,] "26" "a" "p"
[9,] "27" "n" "p"
[10,] "28" "a" "l"
[11,] "29" "n" "e"
[12,] "30" "a" " "
[13,] "42" "p" "s"
[14,] "43" "p" "l"
[15,] "44" "l" "k"
[16,] "45" "e" "d"
[17,] "46" " " ";"
[18,] "47" " " "f"
[19,] "48" " " "a"
我想您需要match
,它返回第一个匹配的索引。移除第一个元素
> ( m <- match(unique(x), x)[-1] )
[1] 10 13 14 16 45 47 48 54 68 69 73 76 86
>(m cbind(x[m-1],x[m])
[,1] [,2]
[1,]“45CCBC44B”Trydiff(grint[-1],grint[-length(grint)])
我想可以肯定地说,您正在寻找由生成的输出。哦,我应该添加链接。谢谢您的提醒,我下次会记得的!谢谢您的回答!这几乎是我所需要的。我想它不仅可以找到相邻字符串之间的所有差异,还可以压缩字符串中的每个字符。like:'diff(“苹果”、“香蕉”)ix[1、][1”“a”“b”[2、][2”“p”“a”[3、][3”“p”“n”[4、][4”“l”“a”[5、][5”“e”“n”@Chenlu当然。请随意选择有用的答案,一旦你得到正确的答案或你最喜欢的答案,请随意接受,因为网站就是这样运作的;)另外,在未来,最好能提供所需的输出,以便明确您想要的内容。干杯。谢谢您的建议:)我还没有很好地理解规则,因为我对编程和网站都是一个新手。@Chenlu我用两种稍微不同的方法更新了我的答案。+1我喜欢这种方法,但我想你您缺少最后一对:“55CCBC11B”“55CCBC41B”
,因为“55CCBC41B”
出现在向量的两个部分。有时可能需要多于第一个匹配项(请参见其中(grint==“55CCBC41B”)
。
x<-c("45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "<5CCBC:4B", "<5CCBC:4B", "<5CCBC:4B", "<<CCBC::B", "<<GGBG::E", "<<GGBG::E", "55CCBC41B", "55CCBC41B")
lagged.x <- c(NA,head(x,-1))
x == lagged.x
[1] NA TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE
samplestrings <- c("apple", "apple", "banana", "banana", "apple", "apple","aslkd;fa")
library(stringr)
samplestrings <- str_pad(samplestrings, max(nchar(samplestrings)) , side="right")
X0 <- unlist(strsplit(samplestrings,split="")) ## Nasty but necessary!
Y0 <- unlist(strsplit(c(samplestrings[-1], rep(" ", max(nchar(samplestrings)))),split="")) ## ...
ix <- which(X0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)] !=
Y0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)])
cbind(ix,X0[ix],Y0[ix])
ix
[1,] "9" "a" "b"
[2,] "10" "p" "a"
[3,] "11" "p" "n"
[4,] "12" "l" "a"
[5,] "13" "e" "n"
[6,] "14" " " "a"
[7,] "25" "b" "a"
[8,] "26" "a" "p"
[9,] "27" "n" "p"
[10,] "28" "a" "l"
[11,] "29" "n" "e"
[12,] "30" "a" " "
[13,] "42" "p" "s"
[14,] "43" "p" "l"
[15,] "44" "l" "k"
[16,] "45" "e" "d"
[17,] "46" " " ";"
[18,] "47" " " "f"
[19,] "48" " " "a"
samplestrings <- c("apple", "apple", "banana", "banana", "apple", "apple","aslkd;fa")
library(stringr)
# use str_pad to make every string equal in number of characters
samplestrings <- str_pad(samplestrings, max(nchar(samplestrings)) , side="right")
findiffs <- rle(samplestrings)
newdf <- data.frame(index = paste0(cumsum(findiffs$length),"-",cumsum(findiffs$length)+1),
firststring = samplestrings[cumsum(findiffs$length)],
secondstring = samplestrings[cumsum(findiffs$length)+1])
newdf <- newdf[-dim(newdf)[1],]
index firststring secondstring
1 2-3 apple banana
2 4-5 banana apple
3 6-7 apple aslkd;fa
X0 <- unlist(strsplit(as.character(newdf$firststring),split="")) ## Nasty but necessary!
Y0 <- unlist(strsplit(as.character(newdf$secondstring),split="")) ## ...
ix <- which(X0 != Y0)
cbind(ix,X0[ix],Y0[ix])
ix
[1,] "1" "a" "b"
[2,] "2" "p" "a"
[3,] "3" "p" "n"
[4,] "4" "l" "a"
[5,] "5" "e" "n"
[6,] "6" " " "a"
[7,] "9" "b" "a"
[8,] "10" "a" "p"
[9,] "11" "n" "p"
[10,] "12" "a" "l"
[11,] "13" "n" "e"
[12,] "14" "a" " "
[13,] "18" "p" "s"
[14,] "19" "p" "l"
[15,] "20" "l" "k"
[16,] "21" "e" "d"
[17,] "22" " " ";"
[18,] "23" " " "f"
[19,] "24" " " "a"
> ( m <- match(unique(x), x)[-1] )
[1] 10 13 14 16 45 47 48 54 68 69 73 76 86
> cbind(x[m-1], x[m])
[,1] [,2]
[1,] "45CCBC44B" "<5CCBC:4B"
[2,] "<5CCBC:4B" "<<CCBC::B"
[3,] "<<CCBC::B" "<<GGBG::E"
[4,] "<<GGBG::E" "55CCBC41B"
[5,] "55CCBC41B" "CC11B1CCE"
[6,] "CC11B1CCE" "CC55B1CCE"
[7,] "CC55B1CCE" "55CCBC44B"
[8,] "55CCBC44B" "G1CCBC1GB"
[9,] "G1CCBC1GB" "91CCBC11B"
[10,] "91CCBC11B" "01CCBC11B"
[11,] "01CCBC11B" "11CCBC11B"
[12,] "11CCBC11B" "15CCBC11B"
[13,] "15CCBC11B" "55CCBC11B"