如何将函数应用于R中数据帧中的特定列集以替换NAs

如何将函数应用于R中数据帧中的特定列集以替换NAs,r,function,dataframe,dry,na,R,Function,Dataframe,Dry,Na,我有一个数据集,我想在其中以不同的方式替换不同列中的NAs。下面是虚拟数据集和复制它的代码 test <- data.frame(ID = c(1:5), FirstName = c(NA,"Sid",NA,"Harsh","CJ"), LastName = c("Snow",NA,"Lapata","Khan",NA), BillNum = c(6:10), Phone

我有一个数据集,我想在其中以不同的方式替换不同列中的NAs。下面是虚拟数据集和复制它的代码

test <- data.frame(ID = c(1:5),
               FirstName = c(NA,"Sid",NA,"Harsh","CJ"),
               LastName = c("Snow",NA,"Lapata","Khan",NA),
               BillNum = c(6:10),
               Phone = c(1213,3123,3123,NA,NA),
               Married = c("Yes","Yes",NA,"NO","Yes"),
               ZIP = c(1111,2222,333,444,555),
               Gender = c("M",NA,"F",NA,"M"),
               Address = c("A","B",NA,"C","D"))
> test
  ID FirstName LastName BillNum Phone Married  ZIP Gender Address
1  1      <NA>     Snow       6  1213     Yes 1111      M       A
2  2       Sid     <NA>       7  3123     Yes 2222   <NA>       B
3  3      <NA>   Lapata       8  3123    <NA>  333      F    <NA>
4  4     Harsh     Khan       9    NA      NO  444   <NA>       C
5  5        CJ     <NA>      10    NA     Yes  555      M       D

我的问题是我不想对每一列分别重复函数调用,因为我有大约200列。我不能使用apply函数,因为我必须先对数据进行子集处理,然后使用lappy将函数应用到原始数据中,然后使用cbind再次应用到原始数据中,这改变了列的顺序。是否有任何方法可以提供列和函数的名称,并将修改后的列与其他列(未更改)一起作为数据集返回,或者在不返回任何内容的情况下就地修改列(例如python中的DataFrame.fillna,其参数inplace=logical)

我们可以使用
tidyverse
进行此操作

library(dplyr)
#specify the columns of interest 
#if there are any patterns, we can use `matches` or `grep`
nm1 <- names(test)[c(2, 3, 5, 9)]
nm2 <- names(test)[c(6, 8)]


#use `mutate_at` by specifying the arguments 'vars' and 'funs'
test %>% 
    mutate_at(vars(one_of(nm1)), funs(Availability_Indicator)) %>%
    mutate_at(vars(one_of(nm2)), funs(NotAvailable_Indicator))
#ID    FirstName     LastName BillNum        Phone      Married  ZIP       Gender      Address
#1  1 NotAvialable    Available       6    Available          Yes 1111            M    Available
#2  2    Available NotAvialable       7    Available          Yes 2222 NotAvailable    Available
#3  3 NotAvialable    Available       8    Available NotAvailable  333            F NotAvialable
#4  4    Available    Available       9 NotAvialable           NO  444 NotAvailable    Available
#5  5    Available NotAvialable      10 NotAvialable          Yes  555            M    Available
数据 与
因子
类列相比,更改
字符
的值更容易。因此,在'data.frame'调用中使用
stringsAsFActors=FALSE
,非数字列将是
character

test <- data.frame(ID = c(1:5),
           FirstName = c(NA,"Sid",NA,"Harsh","CJ"),
           LastName = c("Snow",NA,"Lapata","Khan",NA),
           BillNum = c(6:10),
           Phone = c(1213,3123,3123,NA,NA),
           Married = c("Yes","Yes",NA,"NO","Yes"),
           ZIP = c(1111,2222,333,444,555),
           Gender = c("M",NA,"F",NA,"M"),
           Address = c("A","B",NA,"C","D"), stringsAsFactors=FALSE)

test我们可以很容易地做到这一点
nm1
NotAvailable_Indicator <- function(x){
  x[is.na(x)]<-"NotAvailable"
  return(x)
}
test$Married <- NotAvailable_Indicator(test$Married)
test$Gender <- NotAvailable_Indicator(test$Gender)
ID    FirstName     LastName BillNum        Phone      Married  ZIP       Gender      Address
 1 NotAvialable    Available       6    Available          Yes 1111            M    Available
 2    Available NotAvialable       7    Available          Yes 2222 NotAvailable    Available
 3 NotAvialable    Available       8    Available NotAvailable  333            F NotAvialable
 4    Available    Available       9 NotAvialable           NO  444 NotAvailable    Available
 5    Available NotAvialable      10 NotAvialable          Yes  555            M    Available
library(dplyr)
#specify the columns of interest 
#if there are any patterns, we can use `matches` or `grep`
nm1 <- names(test)[c(2, 3, 5, 9)]
nm2 <- names(test)[c(6, 8)]


#use `mutate_at` by specifying the arguments 'vars' and 'funs'
test %>% 
    mutate_at(vars(one_of(nm1)), funs(Availability_Indicator)) %>%
    mutate_at(vars(one_of(nm2)), funs(NotAvailable_Indicator))
#ID    FirstName     LastName BillNum        Phone      Married  ZIP       Gender      Address
#1  1 NotAvialable    Available       6    Available          Yes 1111            M    Available
#2  2    Available NotAvialable       7    Available          Yes 2222 NotAvailable    Available
#3  3 NotAvialable    Available       8    Available NotAvailable  333            F NotAvialable
#4  4    Available    Available       9 NotAvialable           NO  444 NotAvailable    Available
#5  5    Available NotAvialable      10 NotAvialable          Yes  555            M    Available
test[nm1] <- lapply(test[nm1], Availability_Indicator)
test[nm2] <- lapply(test[nm2], NotAvailable_Indicator)
test <- data.frame(ID = c(1:5),
           FirstName = c(NA,"Sid",NA,"Harsh","CJ"),
           LastName = c("Snow",NA,"Lapata","Khan",NA),
           BillNum = c(6:10),
           Phone = c(1213,3123,3123,NA,NA),
           Married = c("Yes","Yes",NA,"NO","Yes"),
           ZIP = c(1111,2222,333,444,555),
           Gender = c("M",NA,"F",NA,"M"),
           Address = c("A","B",NA,"C","D"), stringsAsFactors=FALSE)