正在读取R中缺少值的文件_R_Dataframe

正在读取R中缺少值的文件

r dataframe

正在读取R中缺少值的文件,r,dataframe,R,Dataframe,我有一个文件名为'fn'的文件，我的阅读如下： age CALCIUM CREATININE GLUCOSE 64.3573 1.1 488 69.9043 8.1 1.1 472 65.6633 8.6 0.8 461 50.3693 8.1 1.3 418 57.0334 8.7 0.8 NEG 81.4939 1.1 NEG 56.954 9.8 1 76.9298 9.1 0.8 NEG > tmpData = read.table(fn, heade

我有一个文件名为'fn'的文件，我的阅读如下：

age CALCIUM CREATININE  GLUCOSE
64.3573     1.1 488
69.9043 8.1 1.1 472
65.6633 8.6 0.8 461
50.3693 8.1 1.3 418
57.0334 8.7 0.8 NEG
81.4939     1.1 NEG
56.954  9.8 1   
76.9298 9.1 0.8 NEG


> tmpData = read.table(fn, header = TRUE,  sep= "\t" , na.strings = c('', 'NA', '<NA>'),  blank.lines.skip = TRUE)
> tmpData
      age CALCIUM CREATININE GLUCOSE
1 64.3573            NA        1.1     488
2 69.9043           8.1        1.1     472
3 65.6633           8.6        0.8     461
4 50.3693           8.1        1.3     418
5 57.0334           8.7        0.8     NEG
6 81.4939            NA        1.1     NEG
7 56.9540           9.8        1.0    <NA>
8 76.9298           9.1        0.8     NEG

年龄钙肌酐葡萄糖 64.3573 1.1 488 69.9043 8.1 1.1 472 65.6633 8.6 0.8 461 50.3693 8.1 1.3 418 57.0334 8.7 0.8负 81.4939 1.1负 56.954 9.8 1 76.9298 9.1 0.8负 >tmpData=read.table（fn，header=TRUE，sep=“\t”，na.strings=c（“”，'na'，“”），blank.lines.skip=TRUE） >tmpData 年龄钙肌酐葡萄糖 164.3573 NA 1.1488 2 69.9043 8.1 1.1 472 3 65.6633 8.6 0.8 461 4 50.3693 8.1 1.3 418 5 57.0334 8.7 0.8负 681.4939 NA 1.1 NEG 7 56.9540 9.8 1.0 876.9298 9.1 0.8负

如上所述读取文件，缺失值替换为NA和。我猜“葡萄糖”列被视为因子。有没有一种简单的方法可以将解释为实NA，并将任何非数值转换为NA（在本例中为NEG转换为NA）

您可以利用

为.numeric将非数值强制转换为NA
这一事实。换言之，请尝试以下方法：
以下是您的数据：
temp <- structure(list(age = c(64.3573, 69.9043, 65.6633, 50.3693, 57.0334, 
  81.4939, 56.954, 76.9298), CALCIUM = c(1.1, 8.1, 8.6, 8.1, 8.7, 
  1.1, 9.8, 9.1), CREATININE = c(NA, 1.1, 0.8, 1.3, 0.8, NA, 1, 
  0.8), GLUCOSE = structure(c(5L, 4L, 3L, 2L, 6L, 6L, 1L, 6L), .Label = c("", 
  "418", "461", "472", "488", "NEG"), class = "factor")), .Names = c("age", 
  "CALCIUM", "CREATININE", "GLUCOSE"), class = "data.frame", row.names = c(NA, 
  -8L))

将最后一列转换为数值，但由于它是一个因子，我们需要首先将其转换为字符。注意警告。事实上，我们对此感到高兴
temp$GLUCOSE <- as.numeric(as.character(temp$GLUCOSE))
# Warning message:
# NAs introduced by coercion 


为了好玩，这里有一个我放在一起的小函数，它提供了一种替代方法：
makemeNA <- function (mydf, NAStrings, fixed = TRUE) {
  if (!isTRUE(fixed)) {
    mydf[] <- lapply(mydf, function(x) gsub(NAStrings, "", x))
    NAStrings <- ""
  }
  mydf[] <- lapply(mydf, function(x) type.convert(
    as.character(x), na.strings = NAStrings))
  mydf
}

如果将“NEG”添加到na.strings，会发生什么情况？如果包含了NEG，则会起作用。但是对于一个普通的字符串，它可以是任何字符序列，它没有任何自动处理这种情况的读取方法
temp
#       age CALCIUM CREATININE GLUCOSE
# 1 64.3573     1.1         NA     488
# 2 69.9043     8.1        1.1     472
# 3 65.6633     8.6        0.8     461
# 4 50.3693     8.1        1.3     418
# 5 57.0334     8.7        0.8      NA
# 6 81.4939     1.1         NA      NA
# 7 56.9540     9.8        1.0      NA
# 8 76.9298     9.1        0.8      NA

makemeNA <- function (mydf, NAStrings, fixed = TRUE) {
  if (!isTRUE(fixed)) {
    mydf[] <- lapply(mydf, function(x) gsub(NAStrings, "", x))
    NAStrings <- ""
  }
  mydf[] <- lapply(mydf, function(x) type.convert(
    as.character(x), na.strings = NAStrings))
  mydf
}

# Change anything that is just text to NA
makemeNA(temp, "[A-Za-z]", fixed = FALSE)
# Change any exact matches with "NEG" to NA
makemeNA(temp, "NEG")
# Change any matches with 3-digit integers to NA
makemeNA(temp, "^[0-9]{3}$", fixed = FALSE)