使用R中的regexp解析(可能)不存在的数字
我试图用包使用R中的regexp解析(可能)不存在的数字,regex,r,Regex,R,我试图用包stringr从R中的字符串中提取数字。有时,数字并不存在。以下是一些示例字符串: str <- c( "cash dividends per share $ - $ - $ - $ 0.08 $ 0.16 cash", "cash dividends per share $ 0.01 $ 12.10 $ 0.01 $ 0.08 $ 0.16 hello", "cash dividends per share $ - $ - $ 0.91 $ - $ 0.16 world", "
stringr
从R中的字符串中提取数字。有时,数字并不存在。以下是一些示例字符串:
str <- c(
"cash dividends per share $ - $ - $ - $ 0.08 $ 0.16 cash",
"cash dividends per share $ 0.01 $ 12.10 $ 0.01 $ 0.08 $ 0.16 hello",
"cash dividends per share $ - $ - $ 0.91 $ - $ 0.16 world",
"cash dividends per share - - 0.12 - 0.16 hsac",
"cash dividends per share $ - $ - $ - $ - $ 0.16 afterwards",
"cash dividends per share $0.12 $ - $0.1 $ - $ - comes",
"cash dividends per share 0.12 - 0.12 - - text",
"cash dividends per share... 0.12 - 0.12 - - random",
"cash dividends per share...0.123 0.321 - - 0.12 blu",
"cash dividends per share ..... $ 0.12 $ - $ 0.12 $ - $ - foo",
"cash dividends per share ..... $0.42 $0.42 $- $- $- bar")
str如果要捕获数字和连字符:
rgxp <- "([0-9]+\\.?[0-9]*)|(-)"
str_extract_all(str, rgxp)
[[1]]
[1] "-" "-" "-" "0.08" "0.16"
[[2]]
[1] "0.01" "12.10" "0.01" "0.08" "0.16"
[[3]]
[1] "-" "-" "0.91" "-" "0.16"
[[4]]
[1] "-" "-" "0.12" "-" "0.16"
[[5]]
[1] "-" "-" "-" "-" "0.16"
[[6]]
[1] "0.12" "-" "0.1" "-" "-"
[[7]]
[1] "0.12" "-" "0.12" "-" "-"
[[8]]
[1] "0.12" "-" "0.12" "-" "-"
[[9]]
[1] "0.123" "0.321" "-" "-" "0.12"
[[10]]
[1] "0.12" "-" "0.12" "-" "-"
[[11]]
[1] "0.42" "0.42" "-" "-" "-"
rgxp使用gsub
将任意两个或多个连续点以及任何非负号、数字或点的字符替换为空格,然后使用read.table
读入。如果要在NAs所在的位置显示减号字符,请省略na.strings=“-”
。没有使用任何软件包
DF <- read.table(text = gsub("[^-0-9.]+|\\.{2,}", " ", str), fill = TRUE, na.strings = "-")
注意:如果要将NA替换为零,请使用:DF[is.NA(DF)]您想要的输出是什么?感谢您的回复。这是一个很好的解决方案,但我希望每个结果向量的长度相同。请查看我对原始问题的编辑。@Michael只需更改为rgxp或regmatches(str,gregexpr(([0-9]+\\.?[0-9]*)|(-str))
,谢谢,很好的解决方案!
rgxp <- "[0-9]+\\.?[0-9]*"
str_extract_all(str, rgxp)
[[1]]
[1] "0.08" "0.16"
[[2]]
[1] "0.01" "12.10" "0.01" "0.08" "0.16"
[[3]]
[1] "0.91" "0.16"
[[4]]
[1] "0.12" "0.16"
[[5]]
[1] "0.16"
[[6]]
[1] "0.12" "0.1"
[[7]]
[1] "0.12" "0.12"
[[8]]
[1] "0.12" "0.12"
[[9]]
[1] "0.123" "0.321" "0.12"
[[10]]
[1] "0.12" "0.12"
[[11]]
[1] "0.42" "0.42"
DF <- read.table(text = gsub("[^-0-9.]+|\\.{2,}", " ", str), fill = TRUE, na.strings = "-")
> DF
V1 V2 V3 V4 V5
1 NA NA NA 0.08 0.16
2 0.010 12.100 0.01 0.08 0.16
3 NA NA 0.91 NA 0.16
4 NA NA 0.12 NA 0.16
5 NA NA NA NA 0.16
6 0.120 NA 0.10 NA NA
7 0.120 NA 0.12 NA NA
8 0.120 NA 0.12 NA NA
9 0.123 0.321 NA NA 0.12
10 0.120 NA 0.12 NA NA
11 0.420 0.420 NA NA NA