在生成新变量时,函数中的for循环和/或lappy

在生成新变量时,函数中的for循环和/或lappy,r,for-loop,lapply,R,For Loop,Lapply,我已经输入了lapply语句(邮政编码来自5个大文本字段) 在函数中: opm_naar_postc=function(kolom1,kolom2,kolom3,kolom4,kolom5) { postc=lapply(kolom1, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1]) postc1=lapply(ko

我已经输入了lapply语句(邮政编码来自5个大文本字段) 在函数中:

opm_naar_postc=function(kolom1,kolom2,kolom3,kolom4,kolom5) {
    postc=lapply(kolom1, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc1=lapply(kolom1, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc2=lapply(kolom2, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc3=lapply(kolom2, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc4=lapply(kolom3, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc5=lapply(kolom3, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc6=lapply(kolom4, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc7=lapply(kolom4, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc8=lapply(kolom5, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][' '][a-zA-Z][a-zA-Z](\\D))", x)))[1])
    postc9=lapply(kolom5, function(x) unlist(regmatches(x,gregexpr("((\\D)[1-4][0-9][0-9][0-9][a-zA-Z][a-zA-Z](\\D))", x)))[1])
然后我想从postc到Post9中删除任何空格、点、NAs等

postcodes=c("postc","postc1","postc2","postc3","postc4","postc5","postc6","postc7","postc8","postc9")
for (i in postcodes) {
  i=gsub(" ","",i)
  i=gsub("NA|[[:punct:]]","",i)  }
最后,我将所有的postc粘贴到post9,因此只剩下一个变量。这个变量是我的返回变量。 我这样调用函数:

df = df %>% mutate(postcode=opm_naar_postc(var1,var2,var3,var4,var5)) 
首先,for循环不起作用(没有错误,但它什么都不做)。当我不使用for循环时,它确实有效。 第二,我想把所有10条应用规则放在一个for循环中,这可能吗?我试过很多东西,但似乎不起作用

谁能帮我

谢谢

我的数据帧df的一个示例:

   var1            var2          var3               var4         var5
blablaehdhde    blablatext   blabla 1983 rf    blablatext     blablatext
1982 rf blabla text blala     blablbal         blaakakk text  hahahahah
blblatext      textte8743GH  sdkhflksfjf       kjsnhblabla     gagagagag
预期成果:

postcode
1983rf
1982rf
8743GH

这里有一个使用正则表达式的想法

gsub('^\\D*?(\\d+)\\s?(\\D{2}).*$', '\\1\\2', grep('\\d+', unlist(df), value = TRUE))

#   var12    var23    var31 
#"1982rf" "8743GH" "1983rf" 
您可以尝试:

# your data
df <- structure(c("blablaehdhde", "1982 rf blabla", "blblatext", "blablatext", 
"text blala", "textte8743GH", "blabla 1983 rf", "blablbal", "sdkhflksfjf", 
"blablatext", "blaakakk text", "kjsnhblabla", "blablatext", "hahahahah", 
"gagagagag"), .Dim = c(3L, 5L), .Dimnames = list(NULL, c("var1", 
"var2", "var3", "var4", "var5")))


# pipeline
library(tidyverse)
library(stringi)
as.tibble(df) %>% 
          gather() %>% 
          mutate(value=gsub(" ", "", value)) %>% 
          mutate(postcode=stri_extract_all_regex(value, "[0-9]+(.{2})", simplify =T)) %>% 
          filter(!is.na(postcode)) 
# A tibble: 3 x 3
    key        value postcode
  <chr>        <chr>    <chr>
1  var1 1982rfblabla   1982rf
2  var2 textte8743GH   8743GH
3  var3 blabla1983rf   1983rf
#您的数据
df%
聚集()%>%
变异(值=gsub(“,”,值))%>%
mutate(postcode=stri_extract_all_regex(值,[0-9]+(.{2})”,simplify=T))%>%
过滤器(!is.na(邮政编码))
#一个tibble:3x3
键值邮政编码
1 var1 1982rfblala 1982rf
2 var2 textte8743GH 8743GH
3 var3 blabla1983rf 1983rf

您的预期输出是什么?一个变量“postcode”,带有数据帧DF中不带空格的邮政编码字符串、NA等。您可以给出一小部分数据帧吗?循环不起作用,因为您没有更改向量
邮政编码
,而是不返回的循环变量
i
。使
i
成为一个整数计数器,并将循环中的
i
替换为
postcodes[i]
似乎可以满足您的需要:
postcodes=c(“postc”、“postc1”、“postc2”、“postc3”、“postc4”、“postc5”、“postc6”、“postc7”、“postc8”、“postc9”),用于(i in 1:length(postcodes)){-postcodes[i]=gsub(“,”,”,postcodes[i])=gsub(“NA”)|[[:punct:][],“”,postcodes[i])}
我尝试过这个方法,但没有任何改变。例如,如果我返回变量“postc”(邮政编码的第一个元素),则不会替换空格。。。