R中的自定义函数,带有用于地址解析的嵌套for循环

R中的自定义函数,带有用于地址解析的嵌套for循环,r,parsing,split,R,Parsing,Split,目前正在解析地址元素,如方向、街道后缀(rd)或单位类型/和单位编号。我试图将以下代码作为“函数”编写,以便只定义有效字符串的列表(在本例中,搜索街道类型 st.list <- c("St","st","stree","Stree","street","Street","dri","Dr", "A

目前正在解析地址元素,如方向、街道后缀(rd)或单位类型/和单位编号。我试图将以下代码作为“函数”编写,以便只定义有效字符串的列表(在本例中,搜索街道类型

st.list <- c("St","st","stree","Stree","street","Street","dri","Dr",
             "Ave","Rd","Ln","ct","CT","Ct","blvd","blv","bl","Aly",
             "Blvd","Cir","Hls","Ln","Loop","Pl","Way","Vis","Road",
             "Drive","Blv","Blvd","Bl")
函数是新手,来自SAS中的宏,其中的过程有点不同,但我正在尝试完成一件类似的事情,我需要多次重新运行代码块,每次只更改两个变量

有什么想法吗

编辑:这是循环实现的,由于数据的性质,我对这里可以包含的内容有限制


在本演示中,我将注释掉“function”行并手动分配x和y,以演示我能够完成但不可复制的任务(如使用单个函数调用为不同元素执行).

您好,欢迎来到,什么是StreetSuffixVector?谢谢!street SuffixVector是街道后缀的部分列表,最初由“Postaster”软件包填充。这并没有完全解析出所有后缀(道路、道路、街道、街道等)因此,这个for循环旨在扫描剩余的地址元素。这不是我的代码中的名称,而是这个示例的一个显式名称,用于演示函数的输入-是要扫描的字符串列表(如上所述)以及剩余地址元素将填充到其中的字符向量。您需要将其包含在内以进行重新分析。现在包含的是街道地址的样本向量和PostAstralso中已半填充的后缀向量。因此,作为项目的更新,对要解析的每个地址元素的循环函数进行硬编码更有意义,因为if-then逻辑有一些细微的变化。我仍然很好奇一般来说如何创建R函数,以及如何在一个可复制的代码块中包含多个进程,并且能够在开始时定义一个“宏变量”。您好,欢迎使用什么是工作StreetSuffixVector?谢谢!street后缀向量是一个部分l街道后缀列表,最初由“Postaster”包填充。这并没有完全解析出所有后缀(道路、道路、街道、街道等)因此,这个for循环旨在扫描剩余的地址元素。这不是我的代码中的名称,而是这个示例的一个显式名称,用于演示函数的输入-是要扫描的字符串列表(如上所述)以及剩余地址元素将填充到其中的字符向量。您需要将其包含在内以进行重新分析。现在包含的是街道地址的样本向量和PostAstralso中已半填充的后缀向量。因此,作为项目的更新,对要解析的每个地址元素的循环函数进行硬编码更有意义,因为if-then逻辑有一些细微的变化。我仍然非常好奇如何创建R函数,以及如何在一个可复制的代码块中包含多个进程,并能够在开始时定义一个“宏变量”。
    st <- c("101 Gilligans Isle","HWY 66","205 2nd st","530 Williams Stree",
        "301 Weenie Hut Way","400 Grand Ave")

postmast-r package would have already seperated the following street suffix's so that a list of the vector populated from address's above would consist of the following below

    st.suf <- ("","","St","","","Ave")

which would have seperated such suffix elements leaving the remaining address vector looking like this

    st <- c("101 Gilligans Isle","HWY 66","205 2nd","530 Williams Stree",
        "301 Weenie Hut Way","400 Grand")

it is far from being comprehensive in terms of strings it parses, so thus the hard-coding of possible mis-spellings and following loops to seperate the suffix data. As you can see in the real example frequencies below, "Stree" will be seperated from this example where the postmast-r package missed it. 

#scan.replace <- function(x,y){

x <- st.list
y <- st.suf
i <- 1

  for (add in st){
  
    diced <- unlist(strsplit(add, " +"))

    num.w <- length(diced)
  
    j <- 1
  
    for (sub in diced){
      
        if(sub %in% x){
            y[i] <- sub
            diced[j] <-  ""
            j <- j + 1  }
    
        else{j <- j + 1}
    }
  
    diced <- str_trim(diced, side="both")
    i <- i + 1
 }

}