从文本中删除数字：R_R_Regex

从文本中删除数字：R

r regex

从文本中删除数字：R,r,regex,R,Regex,您好，我有一个由文本、整数和小数组成的数据集，文本是一个包含所有这些混合的段落，试图从文本内容中只去掉整数和小数，大约有30k trow条目数据输入格式：这这是一个很好的13部分。第135.67条如何剥离内容6879中的66.8 从中获取数字3475.5。数据。879在这369426 输出： 13 135.67 六十六点八六八七九三千四百七十五点五八七九三六九四二六我试着一个接一个地替换所有字母表，但是26+26全部替换会使代码变长，并且会替换。取代。从数字上也可以看出谢谢 Prav

您好，我有一个由文本、整数和小数组成的数据集，文本是一个包含所有这些混合的段落，试图从文本内容中只去掉整数和小数，大约有30k trow条目

数据输入格式：

这这是一个很好的13部分。第135.67条如何剥离内容6879中的66.8 从中获取数字3475.5。数据。879在这369426 输出：

13 135.67 六十六点八六八七九三千四百七十五点五八七九三六九四二六我试着一个接一个地替换所有字母表，但是26+26全部替换会使代码变长，并且会替换。取代。从数字上也可以看出谢谢 Praveen

你可以试试

library(stringr)
lapply(str_extract_all(a, "[0-9.]+"), function(x) as.numeric(x)[!is.na(as.numeric(x))])
[[1]]
[1]  13.00 135.67

[[2]]
[1]   66.8 6879.0

[[3]]
[1]   3475.5    879.0 369426.0

基本理念来自，但我们包括。。lapply转换为数值并排除NA

数据：

a <- c("This. Is a good 13 part. of 135.67 code",
       "how to strip 66.8 in the content 6879",
       "get the numbers 3475.5 from. The data. 879 in this 369426")

你可以试试

library(stringr)
lapply(str_extract_all(a, "[0-9.]+"), function(x) as.numeric(x)[!is.na(as.numeric(x))])
[[1]]
[1]  13.00 135.67

[[2]]
[1]   66.8 6879.0

[[3]]
[1]   3475.5    879.0 369426.0

基本理念来自，但我们包括。。lapply转换为数值并排除NA

数据：

a <- c("This. Is a good 13 part. of 135.67 code",
       "how to strip 66.8 in the content 6879",
       "get the numbers 3475.5 from. The data. 879 in this 369426")

不要忘记R已经内置了正则表达式函数：

input <- c('This. Is a good 13 part. of 135.67 code', 'how to strip 66.8 in the content 6879',
           'get the numbers 3475.5 from. The data. 879 in this 369426')

m <- gregexpr('\\b\\d+(?:\\.\\d+)?\\b', input)
(output <- lapply(regmatches(input, m), as.numeric))

不要忘记R已经内置了正则表达式函数：

input <- c('This. Is a good 13 part. of 135.67 code', 'how to strip 66.8 in the content 6879',
           'get the numbers 3475.5 from. The data. 879 in this 369426')

m <- gregexpr('\\b\\d+(?:\\.\\d+)?\\b', input)
(output <- lapply(regmatches(input, m), as.numeric))

使用strsplit分割成单独的行，然后使用gsub替换以下[：alpha]的选项。或者只是[：alpha]

不含正则表达式和外部软件包的解决方案：

sapply(
  strsplit(input, " "),
  function(x) {
    x <- suppressWarnings(as.numeric(x))
    paste(x[!is.na(x)], collapse = " ")
  }
)
[1] "13 135.67"         "66.8 6879"         "3475.5 879 369426"

不含正则表达式和外部软件包的解决方案：

sapply(
  strsplit(input, " "),
  function(x) {
    x <- suppressWarnings(as.numeric(x))
    paste(x[!is.na(x)], collapse = " ")
  }
)
[1] "13 135.67"         "66.8 6879"         "3475.5 879 369426"

gsub的另一种方法：

可能的重复可能的重复