Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R中位数插补后无变化_R_Imputation - Fatal编程技术网

R中位数插补后无变化

R中位数插补后无变化,r,imputation,R,Imputation,有人知道这里会发生什么吗?我试图对NA值进行插补,但我一无所获。这是我的数据框。我之所以包含整个内容,只是因为我认为完整的内容可能会有所帮助,而不仅仅是前n行: structure(list(INDEX = 1:6, TARGET_WINS = c(39L, 70L, 86L, 70L, 82L, 75L), TEAM_BATTING_H = c(1445L, 1339L, 1377L, 1387L, 1297L, 1279L), TEAM_BATTING_2B = c(194L, 219L

有人知道这里会发生什么吗?我试图对NA值进行插补,但我一无所获。这是我的数据框。我之所以包含整个内容,只是因为我认为完整的内容可能会有所帮助,而不仅仅是前n行:

structure(list(INDEX = 1:6, TARGET_WINS = c(39L, 70L, 86L, 70L, 
82L, 75L), TEAM_BATTING_H = c(1445L, 1339L, 1377L, 1387L, 1297L, 
1279L), TEAM_BATTING_2B = c(194L, 219L, 232L, 209L, 186L, 200L
), TEAM_BATTING_3B = c(39L, 22L, 35L, 38L, 27L, 36L), TEAM_BATTING_HR = c(13L, 
190L, 137L, 96L, 102L, 92L), TEAM_BATTING_BB = c(143L, 685L, 
602L, 451L, 472L, 443L), TEAM_BATTING_SO = c(842, 1075, 917, 
922, 920, 973), TEAM_BASERUN_SB = c(NA, 37L, 46L, 43L, 49L, 107L
), TEAM_BASERUN_CS = c(NA, 28L, 27L, 30L, 39L, 59L), TEAM_BATTING_HBP = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), TEAM_PITCHING_H = c(9364L, 1347L, 1377L, 1396L, 1297L, 1279L
), TEAM_PITCHING_HR = c(84L, 191L, 137L, 97L, 102L, 92L), TEAM_PITCHING_BB = c(927L, 
689L, 602L, 454L, 472L, 443L), TEAM_PITCHING_SO = c(5456L, 1082L, 
917L, 928L, 920L, 973L), TEAM_FIELDING_E = c(1011L, 193L, 175L, 
164L, 138L, 123L), TEAM_FIELDING_DP = c(NA, 155L, 153L, 156L, 
168L, 149L)), row.names = c(NA, 6L), class = "data.frame")
我想看看是否有NA值

any(is.na(moneyball_training_data)) # TRUE
class(moneyball_training_data$TEAM_BATTING_SO) # numeric
  
我找到了这些NA值的位置:

moneyball_training_data %>% summarise(across(, ~ any(is.na(.x))))
我看一个有NA值的变量的类

any(is.na(moneyball_training_data)) # TRUE
class(moneyball_training_data$TEAM_BATTING_SO) # numeric
  
我试着用向量的中值来估算它:

moneyball_training_data$TEAM_BATTING_SO[moneyball_training_data$TEAM_BATTING_SO == NA] <- median(moneyball_training_data$TEAM_BATTING_SO)

any(is.na(moneyball_training_data$TEAM_BATTING_SO)) # TRUE

moneyball\u training\u data$TEAM\u BATTING\u SO[moneyball\u training\u data$TEAM\u BATTING\u SO==NA]创建子集的
boolean
向量时,应该使用之前和之后已经正确使用的
is.NA()

moneyball_training_data$TEAM_BATTING_SO[is.na(moneyball_training_data$TEAM_BATTING_SO)] <- median(moneyball_training_data$TEAM_BATTING_SO, na.rm = TRUE)

any(is.na(moneyball_training_data$TEAM_BATTING_SO)) #
# [1] FALSE

moneyball\u training\u data$TEAM\u BATTING\u SO[is.na(moneyball\u training\u data$TEAM\u BATTING\u SO)]尝试使用
is.na()
而不是
==na
。这非常有效,感谢您的回答。我没想到像你建议的那样把is.na放在向量中。任何特定的why is.na起作用并且==na或==”不起作用?
==”
不起作用仅仅是因为
data.frame()
中缺少的值被编码为
na
,而不是
。至于为什么
==NA
不起作用,回答起来有点棘手。
R
文档?“==”
得出
NA
的缺失值被视为不可比较。好的,我接受你关于基R的理论
moneyball_training_data$TEAM_BATTING_SO[is.na(moneyball_training_data$TEAM_BATTING_SO)] <- median(moneyball_training_data$TEAM_BATTING_SO, na.rm = TRUE)

any(is.na(moneyball_training_data$TEAM_BATTING_SO)) #
# [1] FALSE