R 拆分字符串并在每个数字前添加分隔符

R 拆分字符串并在每个数字前添加分隔符,r,R,所以我有包含参考书目的向量 bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal of the American Statistical Association, 50, 884–893. 2.Breslow, N. E. and Cain, K. C. (1988). Logistic regr

所以我有包含参考书目的向量

bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal
of the American Statistical Association, 50, 884–893.  2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.
Biometrika, 75, 11–20.  3.Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75")
但这很耗时,因为我必须在每个数字(即1和2)之前手动添加两倍的空格,才能使代码正常工作

我也看过这里


这会让你达到你想要的目的:

library(stringr)
library(dplyr)

# The first line adds the "~" character at the right break point
str_split(gsub("([1-9]\\.[]*[A-Z])","~\\1",bibliography), "~") %>%
unlist()  %>%
str_trim(side = c("both")) # Trimming potential spaces at the strings sides

我尝试了一种基于正则表达式的方法

bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal of the American Statistical Association, 50, 884–893.  2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.
                  Biometrika, 75, 11–20.  3.Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75")

out <- gsub("([^0-9][0-9]{1}\\.|^[0-9]{1}\\.)", "\t\\1",bibliography)
out <- unlist(strsplit(out, "\t"))
out <- gsub("^\\s+|\\s+$", "", out)
out <- out[-1]
参考书目
library(stringr)
library(dplyr)

# The first line adds the "~" character at the right break point
str_split(gsub("([1-9]\\.[]*[A-Z])","~\\1",bibliography), "~") %>%
unlist()  %>%
str_trim(side = c("both")) # Trimming potential spaces at the strings sides
bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal of the American Statistical Association, 50, 884–893.  2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.
                  Biometrika, 75, 11–20.  3.Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75")

out <- gsub("([^0-9][0-9]{1}\\.|^[0-9]{1}\\.)", "\t\\1",bibliography)
out <- unlist(strsplit(out, "\t"))
out <- gsub("^\\s+|\\s+$", "", out)
out <- out[-1]