Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中将文本更改为小写在文本挖掘中保留大写首字母缩写_R - Fatal编程技术网

在R中将文本更改为小写在文本挖掘中保留大写首字母缩写

在R中将文本更改为小写在文本挖掘中保留大写首字母缩写,r,R,如何使用R将全文更改为小写,但保留大写首字母缩写?我需要它的文本挖掘和使用udpi包。我当然可以使用大写字母,但无论如何,在使用小写字母的同时保留大写字母缩写 托洛尔(“NASA是一家美国公司”) 我们可以做到: 测试是输入: paste(lapply(strsplit(test," "),function(x) ifelse(x %in% toupper(tm::stopwords()), tolowe

如何使用R将全文更改为小写,但保留大写首字母缩写?我需要它的文本挖掘和使用udpi包。我当然可以使用大写字母,但无论如何,在使用小写字母的同时保留大写字母缩写

托洛尔(“NASA是一家美国公司”)

我们可以做到: 测试是输入:

paste(lapply(strsplit(test," "),function(x) ifelse(x %in% toupper(tm::stopwords()),
                                              tolower(x),x))[[1]],collapse=" ")
[1] "NASA is a US COMPANY"
我编辑

就一点点

simpleCap <- function(x,abr) {
  s <- strsplit(x, " ")[[1]]
  loc = which(!s %in% abr)
  loc_abr = which(s %in% abr)
  tmp_s = s[!s %in% abr]

  paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
        sep="", collapse=" ")

  result = character(length(s))
  result[loc] = strsplit(paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
                               sep="", collapse=" ")," ")[[1]]
  result[loc_abr] = abr
  result = paste(result,collapse = " ")
  return(result)
}
这个怎么样

acronyms <- c('NASA','US')
test <- 'NASA IS A US COMPANY'

a <- tolower(test)
b <- as.list(strsplit(a, " ")[[1]])

for (i in 1:length(b)) {
  if (toupper(b[i]) %in% acronyms) {
    b[i] <- toupper(b[i])
  }
}

c <- paste(b, collapse=" ")

首字母缩略词使用
tm
map
stopwords
paste(lapply(strsplit(test," "),function(x) ifelse(x %in% toupper(tm::stopwords()),
                                              tolower(x),x))[[1]],collapse=" ")
[1] "NASA is a US COMPANY"
simpleCap <- function(x,abr) {
  s <- strsplit(x, " ")[[1]]
  loc = which(!s %in% abr)
  loc_abr = which(s %in% abr)
  tmp_s = s[!s %in% abr]

  paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
        sep="", collapse=" ")

  result = character(length(s))
  result[loc] = strsplit(paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
                               sep="", collapse=" ")," ")[[1]]
  result[loc_abr] = abr
  result = paste(result,collapse = " ")
  return(result)
}
abr <- c("NASA", "US")
simpleCap(abr= abr, 'NASA IS A US COMPANY')
>[1] "NASA Is A US Company"
acronyms <- c('NASA','US')
test <- 'NASA IS A US COMPANY'

a <- tolower(test)
b <- as.list(strsplit(a, " ")[[1]])

for (i in 1:length(b)) {
  if (toupper(b[i]) %in% acronyms) {
    b[i] <- toupper(b[i])
  }
}

c <- paste(b, collapse=" ")