要转换的r代码';缺失';将字符设置为0
我只知道R的基本知识,并试图学习更深入的知识,请帮助我 这就是我的数据的外观 预计产量为 在此之前,我使用下面的代码随机放置缺失的值,现在我需要对该数据执行1-NN,但不能导致非数值参数的存在要转换的r代码';缺失';将字符设置为0,r,type-conversion,dataset,R,Type Conversion,Dataset,我只知道R的基本知识,并试图学习更深入的知识,请帮助我 这就是我的数据的外观 预计产量为 在此之前,我使用下面的代码随机放置缺失的值,现在我需要对该数据执行1-NN,但不能导致非数值参数的存在 >data5=data5 %>% select(1:5) >df.new5 <- data10[-sample(NROW(data10), NROW(data10)*(1 - 0.05),),]<-'missing' >df.new5 <- data10[sam
>data5=data5 %>% select(1:5)
>df.new5 <- data10[-sample(NROW(data10), NROW(data10)*(1 - 0.05),),]<-'missing'
>df.new5 <- data10[sample(NROW(data10), NROW(data10)*(1 - 0.05),),]
>data5=data5%>%选择(1:5)
>df.new5如果您的列是factors,请先将它们更改为字符,然后将所有“缺失”
值替换为0,使用type.convert
转换类
df[] <- lapply(df, as.character)
df[-1][df[-1] == "missing"] <- 0
df <- type.convert(df)
df
# Beach.Name Water.Temperature Turbidity Wave.Height Wave.Period
#7 CalumetBeach 16.3 1.28 0.162 4
#72 missing 0.0 0.00 0.000 0
#18 CalumetBeach 17.0 1.82 0.194 4
#78 missing 0.0 0.00 0.000 0
#24 MontroseBeach 14.5 6.88 0.345 4
#41 CalumetBeach 15.9 1.74 0.148 4
使用NA
s
df[-1] <- lapply(df[-1], function(x) as.numeric(as.character(x)))
sapply(df[-1], mean, na.rm = TRUE)
#Water.Temperature Turbidity Wave.Height Wave.Period
# 15.93 2.93 0.21 4.00
df[-1]您确定要0而不是NA
NA
将允许您在不引入虚假数据的情况下进行所有计算。如果同意的话,只需将每列的类更改为numeric,“missing”将被强制为NA
@SmruthiDabbiru您能详细说明不能
?什么不起作用?你运行了df[]吗是的,我做了,也许我做错了什么,我会再做一次
df[] <- lapply(df, as.character)
df[-1][df[-1] == "missing"] <- 0
df <- type.convert(df)
df
# Beach.Name Water.Temperature Turbidity Wave.Height Wave.Period
#7 CalumetBeach 16.3 1.28 0.162 4
#72 missing 0.0 0.00 0.000 0
#18 CalumetBeach 17.0 1.82 0.194 4
#78 missing 0.0 0.00 0.000 0
#24 MontroseBeach 14.5 6.88 0.345 4
#41 CalumetBeach 15.9 1.74 0.148 4
sapply(df[-1], mean, na.rm = TRUE)
#Water.Temperature Turbidity Wave.Height Wave.Period
# 10.62 1.95 0.14 2.67
df[-1] <- lapply(df[-1], function(x) as.numeric(as.character(x)))
sapply(df[-1], mean, na.rm = TRUE)
#Water.Temperature Turbidity Wave.Height Wave.Period
# 15.93 2.93 0.21 4.00
df <- structure(list(Beach.Name = structure(c(1L, 2L, 1L, 2L, 3L, 1L
), .Label = c("CalumetBeach", "missing", "MontroseBeach"), class = "factor"),
Water.Temperature = structure(c(3L, 5L, 4L, 5L, 1L, 2L), .Label = c("14.5",
"15.9", "16.3", "17", "missing"), class = "factor"), Turbidity = structure(c(1L,
5L, 3L, 5L, 4L, 2L), .Label = c("1.28", "1.74", "1.82", "6.88",
"missing"), class = "factor"), Wave.Height = structure(c(2L,
5L, 3L, 5L, 4L, 1L), .Label = c("0.148", "0.162", "0.194",
"0.345", "missing"), class = "factor"), Wave.Period = structure(c(1L,
2L, 1L, 2L, 1L, 1L), .Label = c("4", "missing"),
class = "factor")), class = "data.frame",
row.names = c("7", "72", "18", "78", "24", "41"))