Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex R strsplit:根据字符分割,除非后面跟着特定字符_Regex_R_String_Split - Fatal编程技术网

Regex R strsplit:根据字符分割,除非后面跟着特定字符

Regex R strsplit:根据字符分割,除非后面跟着特定字符,regex,r,string,split,Regex,R,String,Split,假设我有一个字符串向量,比如 split_these = c("File Location:C:\\Documents","File Location:Pete's Computer","File Location:") 我想根据“:”分割此向量中的每个元素,但后面有“\”时除外。我想要的是回报的东西,比如 #preferred solution "File Location" "C:\\Documents" "File Location" "Pete's Computer" "File L

假设我有一个字符串向量,比如

split_these = c("File Location:C:\\Documents","File Location:Pete's Computer","File Location:") 
我想根据“:”分割此向量中的每个元素,但后面有“\”时除外。我想要的是回报的东西,比如

#preferred solution
"File Location" "C:\\Documents"
"File Location" "Pete's Computer"
"File Location" ""

我试过以下方法

strsplit(split_these, ":")
[[1]]
[1] "File Location" "C"             "\\Documents"  

[[2]]
[1] "File Location" "Pete Computer"

[[3]]
[1] "File Location"

strsplit(split_these, ":[^\\]")
[[1]]
[1] "File Location" ":\\Documents" 

[[2]]
[1] "File Location" "ete Computer" 

[[3]]
[1] "File Location:"

我建议使用带有否定前瞻断言的PCREs。还要注意,需要对反斜杠进行双转义,因为它在R字符串和正则表达式语法中都充当元字符

strsplit(perl=T,split_these,':(?!\\\\)');
## [[1]]
## [1] "File Location" "C:\\Documents"
##
## [[2]]
## [1] "File Location"   "Pete's Computer"
##
## [[3]]
## [1] "File Location"
如果要将列表简化为单个字符向量:

do.call(c,strsplit(perl=T,split_these,':(?!\\\\)'));
## [1] "File Location" "C:\\Documents" "File Location" "Pete's Computer" "File Location"

我想出了一个办法来获取尾随的空字符串字段。由于
strsplit()
总是省略最后一个空字段,因此我们可以简单地将分隔符连接到每个输入字符串的末尾。如果原始字符串中没有尾随分隔符,则将忽略新的空字段,而不会更改结果。如果原始字符串中有尾随分隔符,则我们将获得所需的空字段:

do.call(c,strsplit(perl=T,paste0(split_these,':'),':(?!\\\\)'));
## [1] "File Location" "C:\\Documents" "File Location" "Pete's Computer" "File Location" ""

使用
read.dcf
split\u这些
元素进行迭代,可以得到一个命名的字符向量,该向量可以被重组为一个data.frame:

v <- drop(do.call("cbind", lapply(split_these, function(x) read.dcf(textConnection(x)))))

给予:

> v
    File Location     File Location     File Location 
  "C:\\Documents" "Pete's Computer"                "" 
            ind          values
1 File Location   C:\\Documents
2 File Location Pete's Computer
3 File Location
> stack(v)[2:1]
            ind          values
1 File Location   C:\\Documents
2 File Location Pete's Computer
3 File Location