String 如何以向量化的方式循环R中的字符串
我现在正在学习R,并且我在以一种高效的方式循环R时遇到了困难,尽管我可以以一种非常复杂的方式使用for-loop进行字符串解析,但是我对如何以矢量化的方式编写字符串解析代码感到困惑 比如说String 如何以向量化的方式循环R中的字符串,string,r,parsing,String,R,Parsing,我现在正在学习R,并且我在以一种高效的方式循环R时遇到了困难,尽管我可以以一种非常复杂的方式使用for-loop进行字符串解析,但是我对如何以矢量化的方式编写字符串解析代码感到困惑 比如说 #Social security numbers in the United States are represented by # numbers conforming to the following format: # # a leading 0 followed by two digits # fol
#Social security numbers in the United States are represented by
# numbers conforming to the following format:
#
# a leading 0 followed by two digits
# followed by a dash
# followed by two digits
# followed by a dash
# finally followed by four digits
#
# For example 023-45-7890 would be a valid value,
# but 05-09-1995 and 059-2-27 would not be.
#
# Implement the body of the function 'extractSecuNum' below so that it
# returns a numeric vector whose elements are Social Security numbers
# extracted from a text, i.e., a vector of strings representing the text lines,
# passed to the function as its 'text' argument.
# (You can assume that each string in 'text' contains
# either zero or one Social Security numbers.)
extractSecuNum = function(text){
# Write your code here!
x = 1:length(text)
list_of_input = rep(0, length(text))
for (ind in x){
list_of_input[ind] = sub(' .*', '', sub('^[^0-9]*', '', text[ind]))
}
temp = c()
for (ind in x){
if(list_of_input[ind] != ''){
temp = c(temp, list_of_input[ind])
}
}
temp2 = c()
for (ind in 1:length(temp)){
temp3 = strsplit(temp[ind], '-')
temp2 = c(temp2, temp3)
}
final = c()
for(ind in 1:length(temp2)){
if (sub('0[0-9][0-9]', '', temp2[[ind]][1]) == ''){
if (sub('[0-9][0-9]', '', temp2[[ind]][2]) == ''){
if (sub('[0-9]{4}', '', temp2[[ind]][3]) == '')
{ final = c(final, paste(temp2[[ind]][1], temp2[[ind]][2], temp2[[ind]][3], sep='-')) }
}
}
}
return(final)
}
这些都是类似问题的其他问题,如果你仔细研究,你会发现第二个问题非常复杂,不优雅
我相信问题在于R中的原子变量是一个数组,我无法访问字符串中的字符
如有任何建议,将不胜感激{
extractSecuNum = function(text){
pattern <- "0\\d{2}-\\d{3}-\\d{4}"
unlist(regmatches(text,gregexpr(pattern,text)))
}
text <- paste0("fdkmsal ",
"0",sample(10:99,10),"-",
sample(100:999,10),"-",
sample(1000:9999,10), " vaklra")
result <- extractSecuNum(text)
head(text)
# [1] "fdkmsal 034-965-3362 vaklra" "fdkmsal 029-190-2488 vaklra"
# [3] "fdkmsal 055-785-3898 vaklra" "fdkmsal 033-950-5589 vaklra"
# [5] "fdkmsal 025-833-9312 vaklra" "fdkmsal 054-375-5596 vaklra"
result
# [1] "034-965-3362" "029-190-2488" "055-785-3898" "033-950-5589" "025-833-9312"
# [6] "054-375-5596" "057-680-3317" "020-951-1417" "031-996-4757" "068-402-8678"
模式您可以为您的
extractSecuNum
提供有效的样本输入吗?如果(nchar(x@extracSecuNumtestInput=c('example','023-45-7890','将是一个有效值','05-09-1995','和059-2-27将不会','011-99-2234也可以')correctOutput=c('023-45-7890','011-99-2234'))
这些是我的测试输入,它们worked@rawr我知道我的解决方案并不严格,我现在只是想学习在R中进行解析的一般方法,在R中进行解析与在python中进行解析完全不同,这对meIs模式来说有点混乱