R 使用基于ID变量的因子值填充缺少的值

R 使用基于ID变量的因子值填充缺少的值,r,missing-data,R,Missing Data,我想根据ID变量,用正确的因子值填充 以下是变量: ID <- c(1,1,1,2,2,2,3,3,3) Gender_NA <- c("m",NA,"m",NA,"f",NA,"m","m",NA) Gender <- c("m","m","m","f","f","f","m","m","m") ID来自library(zoo)的na.locf函数可用于将na元素替换为相邻的非na先前元素。使用data.table,我们将“data.frame”转换为“data.tabl

我想根据
ID
变量,用正确的因子值填充

以下是变量:

ID <- c(1,1,1,2,2,2,3,3,3)
Gender_NA <- c("m",NA,"m",NA,"f",NA,"m","m",NA)
Gender  <- c("m","m","m","f","f","f","m","m","m")

ID来自
library(zoo)
na.locf
函数可用于将
na
元素替换为相邻的非na先前元素。使用
data.table
,我们将“data.frame”转换为“data.table”,按“ID”分组,我们用前面的非NA替换NA元素,如果第一个元素是NA,它将不会被替换,我们可以使用第二个
NA.locf
选项
fromLast=TRUE
将剩余的NA替换为后续的非NA元素

library(zoo)
library(data.table)
setDT(Data_have)[, Gender := na.locf(na.locf(Gender_NA, 
            na.rm=FALSE),fromLast=TRUE), by = ID][, Gender_NA := NULL]
Data_have
#    ID Gender
#1:  1      m
#2:  1      m
#3:  1      m
#4:  2      f
#5:  2      f
#6:  2      f
#7:  3      m
#8:  3      m
#9:  3      m
或者,在按
ID
分组时,我们可以使用
na.omit()
忽略所有NAs,并按如下方式选择第一个元素:

setDT(Data_have)[, Gender := na.omit(Gender_NA)[1L], by =  ID][, Gender_NA := NULL]

或者使用与dplyr相同的方法:

library(dplyr)
Data_have %>% 
     group_by(ID) %>%
     transmute(Gender= first(na.omit(Gender_NA)))
#    ID Gender
#   (dbl) (fctr)
#1     1      m
#2     1      m
#3     1      m
#4     2      f
#5     2      f
#6     2      f
#7     3      m
#8     3      m
#9     3      m

下面是我如何使用
数据。表

require(data.table) # v1.9.6+
dt = data.table(ID, Gender_NA)
# Gender_NA is of character type
答案如下:

dt[is.na(Gender_NA), Gender_NA := na.omit(dt)[.SD, Gender_NA, mult="first", on="ID"]]
require(data.table) # v1.9.6+
dt = data.table(ID, Gender_NA)
# Gender_NA is of character type
dt[is.na(Gender_NA), Gender_NA := na.omit(dt)[.SD, Gender_NA, mult="first", on="ID"]]