Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/csharp-4.0/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用带因子的单个标签对多个级别进行编码_R - Fatal编程技术网

R 使用带因子的单个标签对多个级别进行编码

R 使用带因子的单个标签对多个级别进行编码,r,R,我有包含200多种语言的语言数据,其中一些缺失值编码为“”(0个长度字符) 我想使用factor将其压缩为对主要语言进行编码,并将所有其他语言编码为“其他语言”,同时在字符串末尾显示“编码为”(缺少)” 我的计划是: lanfmt <- list( lev = c(prime <- c('English', 'Russian', 'Urdu'), diff <- setdiff(levels(lan), c(prime, '')), ''), lab = c(prime

我有包含200多种语言的语言数据,其中一些缺失值编码为“”(0个长度字符)

我想使用
factor
将其压缩为对主要语言进行编码,并将所有其他语言编码为“其他语言”,同时在字符串末尾显示“编码为”(缺少)”

我的计划是:

lanfmt <- list(
  lev = c(prime <- c('English', 'Russian', 'Urdu'), diff <- setdiff(levels(lan), c(prime, '')), ''),
  lab = c(prime, diff, '')
)

table(factor(lan, lanfmt$levels, lanfmt$labels)

lanfmt我认为你应该把你的因子转换成字符,编辑它们,然后排序。也许这样做会有所帮助(
lan
是列表/数据框的语言向量):


lan这不是我想要的,但它确实给了我灵感。谢谢也来看看。虽然OP必须将整数值重新编码为字符,但许多(所有?)方法也适用于您的情况。
lan <- c("English", "Russian", "Urdu", "", "Indonesian")
lan <- factor(lan)
prime <- c("English", "Russian", "Urdu", "missing")
missing <- ""

lan <- as.character(lan)
lan[lan %in% missing] <- "missing"

lan[!lan %in% prime] <- "other language"
lan <- factor(lan)
lan
[1] English        Russian        Urdu           missing       
[5] other language
Levels: English missing other language Russian Urdu
order <- c("English", "Russian", "Urdu", "other language", "missing")
lan <- ordered(lan, order)
dt <- data.frame(lan, stuff=rnorm(5,4,1))
dt[with(dt, order(lan)),]

             lan    stuff
1        English 4.212460
2        Russian 3.681616
3           Urdu 3.409838
5 other language 3.304108
4        missing 3.938468