Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何比较一列中的两个相邻字符串并遍历所有字符串?_R_Loops_Dataframe_Lapply - Fatal编程技术网

R 如何比较一列中的两个相邻字符串并遍历所有字符串?

R 如何比较一列中的两个相邻字符串并遍历所有字符串?,r,loops,dataframe,lapply,R,Loops,Dataframe,Lapply,我用于查找两个字符串之间的差异()的函数: 错误: Error in match.fun(FUN) : 'diff(a, b)' is not a function, character or symbol 所以我想知道我该怎么做?非常感谢 我不能完全确定我是否理解这个问题。如果您正试图查找列/变量中存在的差异?你可以这样做 将列转换为字符向量 我在这里获取了您的前17个条目,并手动将它们放入向量“x”中 x<-c("45CCBC44B", "45CCBC44B", "45CC

我用于查找两个字符串之间的差异()的函数:

错误:

Error in match.fun(FUN) : 
  'diff(a, b)' is not a function, character or symbol

所以我想知道我该怎么做?非常感谢

我不能完全确定我是否理解这个问题。如果您正试图查找列/变量中存在的差异?你可以这样做

  • 将列转换为字符向量
我在这里获取了您的前17个条目,并手动将它们放入向量“x”中

x<-c("45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B",     "45CCBC44B", "45CCBC44B", "45CCBC44B", "<5CCBC:4B", "<5CCBC:4B", "<5CCBC:4B", "<<CCBC::B", "<<GGBG::E", "<<GGBG::E", "55CCBC41B", "55CCBC41B")

x比我下面的答案更好,只需按照@Andrie在评论中建议的
diff(grint[-1],grint[-length(grint)])

这里有两种稍微不同的方法,它们可以处理不同长度的字符串。如果所有字符串的长度相同,则不需要使用
stru-pad
from
stringr

samplestrings <- c("apple", "apple", "banana", "banana", "apple", "apple","aslkd;fa")
library(stringr)
samplestrings <- str_pad(samplestrings, max(nchar(samplestrings)) , side="right")

  X0 <- unlist(strsplit(samplestrings,split=""))  ## Nasty but necessary!
  Y0 <- unlist(strsplit(c(samplestrings[-1], rep(" ", max(nchar(samplestrings)))),split="")) ## ...
  ix <- which(X0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)] != 
              Y0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)])
  cbind(ix,X0[ix],Y0[ix])

      ix          
 [1,] "9"  "a" "b"
 [2,] "10" "p" "a"
 [3,] "11" "p" "n"
 [4,] "12" "l" "a"
 [5,] "13" "e" "n"
 [6,] "14" " " "a"
 [7,] "25" "b" "a"
 [8,] "26" "a" "p"
 [9,] "27" "n" "p"
[10,] "28" "a" "l"
[11,] "29" "n" "e"
[12,] "30" "a" " "
[13,] "42" "p" "s"
[14,] "43" "p" "l"
[15,] "44" "l" "k"
[16,] "45" "e" "d"
[17,] "46" " " ";"
[18,] "47" " " "f"
[19,] "48" " " "a"

我想您需要
match
,它返回第一个匹配的索引。移除第一个元素

> ( m <- match(unique(x), x)[-1] )
 [1] 10 13 14 16 45 47 48 54 68 69 73 76 86
>(m cbind(x[m-1],x[m])
[,1]        [,2]       

[1,]“45CCBC44B”Try
diff(grint[-1],grint[-length(grint)])
我想可以肯定地说,您正在寻找由生成的输出。哦,我应该添加链接。谢谢您的提醒,我下次会记得的!谢谢您的回答!这几乎是我所需要的。我想它不仅可以找到相邻字符串之间的所有差异,还可以压缩字符串中的每个字符。like:'diff(“苹果”、“香蕉”)ix[1、][1”“a”“b”[2、][2”“p”“a”[3、][3”“p”“n”[4、][4”“l”“a”[5、][5”“e”“n”@Chenlu当然。请随意选择有用的答案,一旦你得到正确的答案或你最喜欢的答案,请随意接受,因为网站就是这样运作的;)另外,在未来,最好能提供所需的输出,以便明确您想要的内容。干杯。谢谢您的建议:)我还没有很好地理解规则,因为我对编程和网站都是一个新手。@Chenlu我用两种稍微不同的方法更新了我的答案。+1我喜欢这种方法,但我想你您缺少最后一对:
“55CCBC11B”“55CCBC41B”
,因为
“55CCBC41B”
出现在向量的两个部分。有时可能需要多于第一个匹配项(请参见
其中(grint==“55CCBC41B”)
x<-c("45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B", "45CCBC44B",     "45CCBC44B", "45CCBC44B", "45CCBC44B", "<5CCBC:4B", "<5CCBC:4B", "<5CCBC:4B", "<<CCBC::B", "<<GGBG::E", "<<GGBG::E", "55CCBC41B", "55CCBC41B")
lagged.x <- c(NA,head(x,-1))
x == lagged.x


[1]    NA  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE   TRUE FALSE  TRUE
samplestrings <- c("apple", "apple", "banana", "banana", "apple", "apple","aslkd;fa")
library(stringr)
samplestrings <- str_pad(samplestrings, max(nchar(samplestrings)) , side="right")

  X0 <- unlist(strsplit(samplestrings,split=""))  ## Nasty but necessary!
  Y0 <- unlist(strsplit(c(samplestrings[-1], rep(" ", max(nchar(samplestrings)))),split="")) ## ...
  ix <- which(X0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)] != 
              Y0[-length(X0):-(length(X0)-max(nchar(samplestrings))+1)])
  cbind(ix,X0[ix],Y0[ix])

      ix          
 [1,] "9"  "a" "b"
 [2,] "10" "p" "a"
 [3,] "11" "p" "n"
 [4,] "12" "l" "a"
 [5,] "13" "e" "n"
 [6,] "14" " " "a"
 [7,] "25" "b" "a"
 [8,] "26" "a" "p"
 [9,] "27" "n" "p"
[10,] "28" "a" "l"
[11,] "29" "n" "e"
[12,] "30" "a" " "
[13,] "42" "p" "s"
[14,] "43" "p" "l"
[15,] "44" "l" "k"
[16,] "45" "e" "d"
[17,] "46" " " ";"
[18,] "47" " " "f"
[19,] "48" " " "a"
samplestrings <- c("apple", "apple", "banana", "banana", "apple", "apple","aslkd;fa")
library(stringr) 
# use str_pad to make every string equal in number of characters
samplestrings <- str_pad(samplestrings, max(nchar(samplestrings)) , side="right")

findiffs <- rle(samplestrings)

newdf <- data.frame(index = paste0(cumsum(findiffs$length),"-",cumsum(findiffs$length)+1), 
          firststring = samplestrings[cumsum(findiffs$length)],
          secondstring = samplestrings[cumsum(findiffs$length)+1])

newdf <- newdf[-dim(newdf)[1],] 

  index firststring secondstring
1   2-3    apple        banana  
2   4-5    banana       apple   
3   6-7    apple        aslkd;fa
  X0 <- unlist(strsplit(as.character(newdf$firststring),split=""))  ## Nasty but necessary!
  Y0 <- unlist(strsplit(as.character(newdf$secondstring),split=""))  ## ...
  ix <- which(X0 != Y0)
  cbind(ix,X0[ix],Y0[ix]) 

     ix          
 [1,] "1"  "a" "b"
 [2,] "2"  "p" "a"
 [3,] "3"  "p" "n"
 [4,] "4"  "l" "a"
 [5,] "5"  "e" "n"
 [6,] "6"  " " "a"
 [7,] "9"  "b" "a"
 [8,] "10" "a" "p"
 [9,] "11" "n" "p"
[10,] "12" "a" "l"
[11,] "13" "n" "e"
[12,] "14" "a" " "
[13,] "18" "p" "s"
[14,] "19" "p" "l"
[15,] "20" "l" "k"
[16,] "21" "e" "d"
[17,] "22" " " ";"
[18,] "23" " " "f"
[19,] "24" " " "a"
> ( m <- match(unique(x), x)[-1] )
 [1] 10 13 14 16 45 47 48 54 68 69 73 76 86
> cbind(x[m-1], x[m])
      [,1]        [,2]       
 [1,] "45CCBC44B" "<5CCBC:4B"
 [2,] "<5CCBC:4B" "<<CCBC::B"
 [3,] "<<CCBC::B" "<<GGBG::E"
 [4,] "<<GGBG::E" "55CCBC41B"
 [5,] "55CCBC41B" "CC11B1CCE"
 [6,] "CC11B1CCE" "CC55B1CCE"
 [7,] "CC55B1CCE" "55CCBC44B"
 [8,] "55CCBC44B" "G1CCBC1GB"
 [9,] "G1CCBC1GB" "91CCBC11B"
[10,] "91CCBC11B" "01CCBC11B"
[11,] "01CCBC11B" "11CCBC11B"
[12,] "11CCBC11B" "15CCBC11B"
[13,] "15CCBC11B" "55CCBC11B"