Regex 通过第一个冒号提取字符串_Regex_String_R_Regex Negation

Regex 通过第一个冒号提取字符串

regex string r

Regex 通过第一个冒号提取字符串,regex,string,r,regex-negation,Regex,String,R,Regex Negation,我有一个字符串数据集，希望提取一个子字符串，包括第一个冒号。早些时候，我在这里发帖询问如何提取第一个冒号后的部分：下面我列出了解决当前问题的一些尝试我知道^[^::::+：匹配我想要保留的部分，但我不知道如何提取该部分下面是一个示例数据集和期望的结果 my.data <- "here is: some text here is some more. even: more text still more text this text keeps: going." my.data2 &l

我有一个字符串数据集，希望提取一个子字符串，包括第一个冒号。早些时候，我在这里发帖询问如何提取第一个冒号后的部分：下面我列出了解决当前问题的一些尝试

我知道^[^::::+：匹配我想要保留的部分，但我不知道如何提取该部分

下面是一个示例数据集和期望的结果

my.data <- "here is: some text
here is some more.
even: more text
still more text
this text keeps: going."

my.data2 <- readLines(textConnection(my.data))

desired.result <- "here is:
0
even:
0
this text keeps:"

desired.result2 <- readLines(textConnection(desired.result))

# Here are some of my attempts

# discards line 2 and 4 but does not extract portion from lines 1,3, and 5.
ifelse( my.data2 == gsub("^[^:]+:", "", my.data2), '', my.data2)

# returns the portion I do not want rather than the portion I do want
sub("^[^:]+:", "\\1", my.data2, perl=TRUE)

# returns an entire line if it contains a colon
grep("^[^:]+:", my.data2, value=TRUE)

# identifies which rows contain a match
regexpr("^[^:]+:", my.data2)

# my attempt at anchoring the right end instead of the left end
regexpr("[^:]+:$", my.data2)

前面的问题涉及到返回比赛的反面。如果我从前面链接的问题的解决方案开始，我还没有弄清楚如何在R中实现此解决方案：

我最近获得了RegexBuddy学习正则表达式。这就是我如何知道^[^::::+：匹配我想要的。我只是无法使用这些信息来提取匹配项

我知道stringr软件包。也许这会有所帮助，但我更喜欢使用BaseR的解决方案

谢谢你的建议。

我知道^[^::::+：匹配我想要保留的部分，但我不知道如何提取该部分

因此，只需将parens括起来，并在末尾添加.+$，然后使用sub和一个引用

sub("(^[^:]+:).+$", "\\1", vec)

 step1 <- sub("^([^:]+:).+$", "\\1", my.data2)
 step2 <- ifelse(grepl(":", step1), step1, 0)
 step2
#[1] "here is:"         "0"                "even:"            "0"               
#[5] "this text keeps:"

不清楚您是否希望将它们作为的单独矢量元素与换行符粘贴在一起：

> step3 <- paste0(step2, collapse="\n")
> step3
[1] "here is:\n0\neven:\n0\nthis text keeps:"
> cat(step3)
here is:
0
even:
0
this text keeps:

我知道^[^::::+：匹配我想要保留的部分，但我不知道如何提取该部分

因此，只需将parens括起来，并在末尾添加.+$，然后使用sub和一个引用

sub("(^[^:]+:).+$", "\\1", vec)

 step1 <- sub("^([^:]+:).+$", "\\1", my.data2)
 step2 <- ifelse(grepl(":", step1), step1, 0)
 step2
#[1] "here is:"         "0"                "even:"            "0"               
#[5] "this text keeps:"

不清楚您是否希望将它们作为的单独矢量元素与换行符粘贴在一起：

> step3 <- paste0(step2, collapse="\n")
> step3
[1] "here is:\n0\neven:\n0\nthis text keeps:"
> cat(step3)
here is:
0
even:
0
this text keeps:

这似乎产生了您正在寻找的内容，尽管它只返回带有冒号的行的位：

grep(":",gsub("(^[^:]+:).*$","\\1",my.data2 ),value=TRUE)
[1] "here is:"         "even:"            "this text keeps:"

当我输入时，我看到了@DWin的答案，它也建议了parens，还有ifelse，它也给了你0。

这似乎产生了你想要的东西，尽管它只返回带有冒号的行位：

grep(":",gsub("(^[^:]+:).*$","\\1",my.data2 ),value=TRUE)
[1] "here is:"         "even:"            "this text keeps:"

当我输入时，我看到了@DWin的答案，它也建议了parens，还有ifelse，它也给了你0。

另一种不太优雅的strsplit方法：

使用strsplit的另一种不那么优雅的方法：

我认为您只是缺少了捕获括号，并且–您的表达式（包括它们）将是^[^::::+：我认为您正在寻找的是正则表达式组。也许这会有帮助？我认为您只是缺少了捕获括号，并且–您的表达式（包括它们）将是^[^:::+）：我认为您正在寻找的是正则表达式组。也许这有帮助？尽可能避免正则表达式不是不雅的。尽可能避免正则表达式不是不雅的。