String 如何以向量化的方式循环R中的字符串

String 如何以向量化的方式循环R中的字符串,string,r,parsing,String,R,Parsing,我现在正在学习R,并且我在以一种高效的方式循环R时遇到了困难,尽管我可以以一种非常复杂的方式使用for-loop进行字符串解析,但是我对如何以矢量化的方式编写字符串解析代码感到困惑 比如说 #Social security numbers in the United States are represented by # numbers conforming to the following format: # # a leading 0 followed by two digits # fol

我现在正在学习R,并且我在以一种高效的方式循环R时遇到了困难,尽管我可以以一种非常复杂的方式使用for-loop进行字符串解析,但是我对如何以矢量化的方式编写字符串解析代码感到困惑

比如说

#Social security numbers in the United States are represented by
# numbers conforming to the following format:
#
# a leading 0 followed by two digits
# followed by a dash
# followed by two digits
# followed by a dash
# finally followed by four digits
#
# For example 023-45-7890 would be a valid value,
# but 05-09-1995 and 059-2-27 would not be.
#
# Implement the body of the function 'extractSecuNum' below so that it
# returns a numeric vector whose elements are Social Security numbers
# extracted from a text, i.e., a vector of strings representing the text lines,
# passed to the function as its 'text' argument.
# (You can assume that each string in 'text' contains
# either zero or one Social Security numbers.)


extractSecuNum = function(text){
# Write your code here!

x = 1:length(text)
list_of_input = rep(0, length(text))


for (ind in x){
  list_of_input[ind] = sub(' .*', '', sub('^[^0-9]*', '', text[ind]))
}

temp = c()

for (ind in x){
  if(list_of_input[ind] != ''){
    temp = c(temp, list_of_input[ind])
  }
}

temp2 = c()
for (ind in 1:length(temp)){
  temp3 = strsplit(temp[ind], '-')
  temp2 = c(temp2, temp3)
}

final = c()

for(ind in 1:length(temp2)){
  if (sub('0[0-9][0-9]', '', temp2[[ind]][1]) == ''){
    if (sub('[0-9][0-9]', '', temp2[[ind]][2]) == ''){
      if (sub('[0-9]{4}', '', temp2[[ind]][3]) == '')
      { final = c(final, paste(temp2[[ind]][1], temp2[[ind]][2], temp2[[ind]][3], sep='-')) }
    }
    }
  }

return(final)
}
这些都是类似问题的其他问题,如果你仔细研究,你会发现第二个问题非常复杂,不优雅

我相信问题在于R中的原子变量是一个数组,我无法访问字符串中的字符

如有任何建议,将不胜感激{
extractSecuNum = function(text){
  pattern <- "0\\d{2}-\\d{3}-\\d{4}"
  unlist(regmatches(text,gregexpr(pattern,text)))
}

text <- paste0("fdkmsal ",
               "0",sample(10:99,10),"-",
               sample(100:999,10),"-",
               sample(1000:9999,10), " vaklra")
result <- extractSecuNum(text)

head(text)
# [1] "fdkmsal 034-965-3362 vaklra" "fdkmsal 029-190-2488 vaklra"
# [3] "fdkmsal 055-785-3898 vaklra" "fdkmsal 033-950-5589 vaklra"
# [5] "fdkmsal 025-833-9312 vaklra" "fdkmsal 054-375-5596 vaklra"
result
# [1] "034-965-3362" "029-190-2488" "055-785-3898" "033-950-5589" "025-833-9312"
# [6] "054-375-5596" "057-680-3317" "020-951-1417" "031-996-4757" "068-402-8678"

模式您可以为您的
extractSecuNum
提供有效的样本输入吗?
如果(nchar(x@extracSecuNum
testInput=c('example','023-45-7890','将是一个有效值','05-09-1995','和059-2-27将不会','011-99-2234也可以')correctOutput=c('023-45-7890','011-99-2234'))
这些是我的测试输入,它们worked@rawr我知道我的解决方案并不严格,我现在只是想学习在R中进行解析的一般方法,在R中进行解析与在python中进行解析完全不同,这对meIs
模式来说有点混乱