Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/asp.net-core/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 不带';不都包含相同的级别_R_Data.table - Fatal编程技术网

R 不带';不都包含相同的级别

R 不带';不都包含相同的级别,r,data.table,R,Data.table,我有150k列,共1.05亿个条目,它们是“无”、“01”、“12”、“2+”。不幸的是,并非所有列都包含所有因素 e、 g df% data.table::as.data.table() 所以如果我这样做了 df$x1<-as.integer(as.factor(df$x1)) df$x1如果我们想在多个列上应用,请跨 library(dplyr) df1 <- df %>% mutate(across(everything(), ~ as.integ

我有150k列,共1.05亿个条目,它们是
“无”、“01”、“12”、“2+”
。不幸的是,并非所有列都包含所有因素

e、 g

df%
data.table::as.data.table()
所以如果我这样做了

df$x1<-as.integer(as.factor(df$x1))

df$x1如果我们想在多个列上应用,请跨

library(dplyr)
df1 <- df %>%
    mutate(across(everything(), ~
      as.integer(factor(., levels = c("none","01","12","2+"))))

或者使用
base R

df[] <-  lapply(df, function(x) as.integer(factor(x, levels = c("none","01","12","2+"))))

df[]这是一个
数据表
解决方案。
对于大型数据集,与其调用两次
names(df)
,不如只调用一次,在转换
df
的列之前分配值,然后使用150K个名称的向量

library(data.table)

levs <- c("none","01","12","2+")
df[, (names(df)) := lapply(.SD, factor, levels = levs), .SDcols = names(df)]

identical(levels(df$x1), levels(df$x2))
#[1] TRUE

哇,闪电很快,非常感谢你!超级小警告(我知道快速谷歌可能会有用),但如果我不想做所有的专栏。说我想忽略第一个吗?@HCAI更新了答案谢谢。我刚看了dplyr,有一个很好的解释“跨越”,看起来非常有用!以(“seval”)开头对我来说可能很好。@HCAI这些只是
选择帮助程序
,当某些列名具有模式时可以使用这些帮助程序,例如前缀/后缀或其他与
匹配的正则表达式
library(dplyr)
df1 <- df %>%
    mutate(across(everything(), ~
      as.integer(factor(., levels = c("none","01","12","2+"))))
df1 <- df %>%
    mutate(across(-1, ~
      as.integer(factor(., levels = c("none","01","12","2+"))))
df[] <-  lapply(df, function(x) as.integer(factor(x, levels = c("none","01","12","2+"))))
library(data.table)

levs <- c("none","01","12","2+")
df[, (names(df)) := lapply(.SD, factor, levels = levs), .SDcols = names(df)]

identical(levels(df$x1), levels(df$x2))
#[1] TRUE
df[, (names(df)) := lapply(.SD, function(x){
  as.integer(factor(x, levels = levs))
}), .SDcols = names(df)]