R中的非字母数字字符

R中的非字母数字字符,r,R,对于大写、小写字母和10位数字,我可以生成包含所有字母或10位数字的向量,如下所示: A <- LETTERS[0:26] B <- letters[0:26] C <- seq(0,9) A这可能很有用。ASCII字符集按类似类型的字符(字母等)排列 这有点冗长,可能有更好的网站(以及获得相同结果的更好方法),但是 库(XML);图书馆(RCurl) doc这个答案只是为了娱乐,列出你想要的字符,然后使用strsplit生成你的向量 > D <- strspli

对于大写、小写字母和10位数字,我可以生成包含所有字母或10位数字的向量,如下所示:

A <- LETTERS[0:26]
B <- letters[0:26]
C <- seq(0,9)

A这可能很有用。ASCII字符集按类似类型的字符(字母等)排列


这有点冗长,可能有更好的网站(以及获得相同结果的更好方法),但是

库(XML);图书馆(RCurl)

doc这个答案只是为了娱乐,列出你想要的字符,然后使用strsplit
生成你的向量

> D <- strsplit('!"#$%&\'()*+,-./\\:;<=>?@[]^_`{|}~', '(?=.)', perl=T)[[1]]
##  [1] "!"  "\"" "#"  "$"  "%"  "&"  "'"  "("  ")"  "*"  "+"  ","  "-"  "."  "/" 
## [16] "\\" ":"  ";"  "<"  "="  ">"  "?"  "@"  "["  "]"  "^"  "_"  "`"  "{"  "|" 
## [31] "}"  "~" 

这是另一种选择。生成所有ascii字符,然后用正则表达式过滤掉非标点符号

ascii <- rawToChar(as.raw(0:127), multiple=TRUE)
ascii[grepl('[[:punct:]]', ascii)]

# [1] "!"  "\"" "#"  "$"  "%"  "&"  "'"  "("  ")"  "*"  "+"  ","  "-"  "."  "/"  ":"  ";"  "<"  "="  ">"  "?"  "@" 
# [23] "["  "\\" "]"  "^"  "_"  "`"  "{"  "|"  "}"  "~" 

ascii Hi@RichardScriven,很抱歉我没有真正理解它。如果您需要所有ascii字符,
rawToChar(as.raw(1:127),multiple=T)
应该可以使用。现在还不清楚你到底是如何选择你的名单的。有许多字符是不可打印的。另外,这取决于您的特定编码。扩展页面中可能会有更多的字符,UTF-8等编码定义了更多的字符代码。你到底想做什么?如果要将其中的几个字符存储在向量中,则需要对它们进行转义(使用“\\”)。我想要的是rawToChar(as.raw(c(32:47,58:64,91,93:96123:126)),multiple=T。
library(XML); library(RCurl)
doc <- htmlParse(getURL("https://wci.llnl.gov/codes/basis/manual/node161.html"))
xp <- xpathSApply(doc, "//tr/td", xmlValue, trim = TRUE) 
xp[nzchar(xp) & nchar(xp) == 1]
#  [1] "!" "[" "%" "," "]" "&" "-" "|" "'" "." "=" "~" "("
# [14] "/" ")" "*" "=" "{" "?" "`" "}" "@" ":" ";" "^" " "
> URL <- "http://datadebrief.blogspot.com/2011/03/ascii-code-table-in-r.html"
> r <- readLines(URL, warn = FALSE)[780:874]
> s <- sapply(strsplit(r, "\\s+"), "[", 1) 
> s[!s %in% c(letters, LETTERS, 0:9)]
#  [1] ""     "!"    "\""   "#"    "$"    "%"    "&"    "'"    "("   
# [10] ")"    "*"    "+"    ","    "-"    "."    "/"    ":"    ";"   
# [19] "<"    "="    ">"    "?"    "@"    "["    "\\\\" "]"    "^"   
# [28] "_"    "`"    "{"    "|"    "}"    "~" 
> D <- strsplit('!"#$%&\'()*+,-./\\:;<=>?@[]^_`{|}~', '(?=.)', perl=T)[[1]]
##  [1] "!"  "\"" "#"  "$"  "%"  "&"  "'"  "("  ")"  "*"  "+"  ","  "-"  "."  "/" 
## [16] "\\" ":"  ";"  "<"  "="  ">"  "?"  "@"  "["  "]"  "^"  "_"  "`"  "{"  "|" 
## [31] "}"  "~" 
> D <- gsub('[^\\pP\\pS]', '', rawToChar(as.raw(1:127), multiple=T), perl=T)
> D[D != ""]
##  [1] "!"  "\"" "#"  "$"  "%"  "&"  "'"  "("  ")"  "*"  "+"  ","  "-"  "."  "/" 
## [16] ":"  ";"  "<"  "="  ">"  "?"  "@"  "["  "\\" "]"  "^"  "_"  "`"  "{"  "|" 
## [31] "}"  "~" 
ascii <- rawToChar(as.raw(0:127), multiple=TRUE)
ascii[grepl('[[:punct:]]', ascii)]

# [1] "!"  "\"" "#"  "$"  "%"  "&"  "'"  "("  ")"  "*"  "+"  ","  "-"  "."  "/"  ":"  ";"  "<"  "="  ">"  "?"  "@" 
# [23] "["  "\\" "]"  "^"  "_"  "`"  "{"  "|"  "}"  "~"