按R中的最后两个字符拆分字符串？（/负字符串索引）_R_Dataframe_Split

按R中的最后两个字符拆分字符串？（/负字符串索引）

r dataframe

按R中的最后两个字符拆分字符串？（/负字符串索引）,r,dataframe,split,R,Dataframe,Split,我的数据框看起来像： b <- data.frame(height = c(190,165,174,176), name = c('John Smith 34','Mr.Turner 54', 'Antonio P. 23', 'John Brown 31')) # height name # 1 190 John Smith 34 # 2 165 Mr.Turner 54 # 3 174 Antonio P. 23 # 4 176 Joh

我的数据框看起来像：

b <- data.frame(height = c(190,165,174,176), name = c('John Smith 34','Mr.Turner 54', 'Antonio P. 23', 'John Brown 31'))

#   height          name
# 1    190 John Smith 34
# 2    165  Mr.Turner 54
# 3    174 Antonio P. 23
# 4    176 John Brown 31

我如何做到这一点？

这里有许多使用正则表达式的选项。我会使用

substr

，因为您想知道要提取的字符数

在

数据表中

（用于语法）：

请注意，名称应为

字符

：

  b <- data.frame(
  height = c(190,165,174,176), 
  name = c('John Smith 34','Mr.Turner 54', 'Antonio P. 23', 'John Brown 31'),
  stringsAsFactors = FALSE)

btidyr:：separate
通过允许传递拆分位置的整数索引（包括从字符串末尾开始的负索引），简化了列的分隔。（当然，正则表达式也可以使用。）
或由最后一个空格分隔：
b %>% separate(name, into = c('name', 'age'), sep = '\\s(?=\\S*?$)', convert = TRUE)

返回相同的东西
在base R中，需要做更多的工作：
b$name <- as.character(b$name)
split_name <- strsplit(b$name, '\\s(?=\\S*?$)', perl = TRUE)
split_name <- do.call(rbind, split_name)
colnames(split_name) <- c('name', 'age')
b <- data.frame(b[-2], split_name, stringsAsFactors = FALSE)
b$age <- type.convert(b$age)

b
##   height       name age
## 1    190 John Smith  34
## 2    165  Mr.Turner  54
## 3    174 Antonio P.  23
## 4    176 John Brown  31

b$name带基数R（与@agstudy答案中使用的数据相同）：
data.frame（t）apply（b，1，函数（x）{s我个人认为下面的正则表达式最有用
library (stringr)
b $age <- str_extract (b$name, "\\d{1,3}$")
b $name <- str_replace (b $name,  "\\d{1,3}$", "")

库（stringr）
b$age我们可以使用sub
创建一个分隔符（，
），而不是年龄前的空格，使用read.table
和cbind
读取它，第一列使用base R

cbind(b[1],read.table(text=sub("\\s+(\\d+)$", ", \\1", b$name), 
                 col.names = c("name", "age"), header=FALSE, sep=","))
#  height       name age
#1    190 John Smith  34
#2    165  Mr.Turner  54
#3    174 Antonio P.  23
#4    176 John Brown  31


或者使用从tidyr

library(tidyr)
extract(b, name, into = c("name", "age"), "(.*)\\s+(\\S+)$")
#  height       name age
#1    190 John Smith  34
#2    165  Mr.Turner  54
#3    174 Antonio P.  23
#4    176 John Brown  31

我会在最后一个空格分开，因为年龄有时会有三位数。请参见library（tidyr）；b%>%separate（name，into=c（'name'，'age'），sep=-3，convert=TRUE）
或b@alistaire，非常感谢！cbind（b[1]，read.csv（text=gsub（'（…）$，'，\\1'，b$name），header=FALSE））
data.frame(t(apply(b,1,function(x) {s <- unlist(strsplit(trimws(x[2]), " "));
           c(x[1],paste0(head(s,-1),collapse=" "),tail(s,1)) })))

   # X1         X2 X3
# 1 190 John Smith 34
# 2 165  Mr.Turner 54
# 3 174 Antonio P. 23
# 4 176 John Brown 31

library (stringr)
b $age <- str_extract (b$name, "\\d{1,3}$")
b $name <- str_replace (b $name,  "\\d{1,3}$", "")

cbind(b[1],read.table(text=sub("\\s+(\\d+)$", ", \\1", b$name), 
                 col.names = c("name", "age"), header=FALSE, sep=","))
#  height       name age
#1    190 John Smith  34
#2    165  Mr.Turner  54
#3    174 Antonio P.  23
#4    176 John Brown  31

library(tidyr)
extract(b, name, into = c("name", "age"), "(.*)\\s+(\\S+)$")
#  height       name age
#1    190 John Smith  34
#2    165  Mr.Turner  54
#3    174 Antonio P.  23
#4    176 John Brown  31