将句子的第一个单词大写(regex、gsub、gregexpr)
假设我有以下文本:将句子的第一个单词大写(regex、gsub、gregexpr),r,R,假设我有以下文本: txt <- as.character("this is just a test! i'm not sure if this is O.K. or if it will work? who knows. regex is sorta new to me.. There are certain cases that I may not figure out?? sad! ^_^") 哪些是匹配的正确子字符串索引 但是,如何实现这一点以正确地将所需的字符大写?我假设
txt <- as.character("this is just a test! i'm not sure if this is O.K. or if it will work? who knows. regex is sorta new to me.. There are certain cases that I may not figure out?? sad! ^_^")
哪些是匹配的正确子字符串索引
但是,如何实现这一点以正确地将所需的字符大写?我假设我必须
strsplit
然后 您的regex
似乎不适用于您的示例,因此我从您的示例中偷了一个
txt使用可能会使这类任务稍微简单一些。这实现了merlin2011使用的相同正则表达式
txt <- as.character("this is just a test! i'm not sure if this is O.K. or if it will work? who knows. regex is sorta new to me.. There are certain cases that I may not figure out?? sad! ^_^")
re <- rex(
capture(name = 'first_letter', alnum),
capture(name = 'sentence',
any_non_puncts,
zero_or_more(
group(
punct %if_next_isnt% space,
any_non_puncts
)
),
maybe(punct)
)
)
re_substitutes(txt, re, "\\U\\1\\E\\2", global = TRUE)
#>[1] "This is just a test! I'm not sure if this is O.K. Or if it will work? Who knows. Regex is sorta new to me.. There are certain cases that I may not figure out?? Sad! ^_^"
txt我对r一无所知,抱歉,但通常会得到第一个字符,将其加上大写,然后将其浓缩到[1:](包含字符串其余部分的子字符串)…第一个相关问题会为您提供特定于r的信息。这真的很有帮助!!我不知道这个存在。
txt <- as.character("this is just a test! i'm not sure if this is O.K. or if it will work? who knows. regex is sorta new to me.. There are certain cases that I may not figure out?? sad! ^_^")
print(txt)
gsub("([^.!?\\s])([^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?)(?=\\s|$)", "\\U\\1\\E\\2", txt, perl=T, useBytes = F)
txt <- as.character("this is just a test! i'm not sure if this is O.K. or if it will work? who knows. regex is sorta new to me.. There are certain cases that I may not figure out?? sad! ^_^")
re <- rex(
capture(name = 'first_letter', alnum),
capture(name = 'sentence',
any_non_puncts,
zero_or_more(
group(
punct %if_next_isnt% space,
any_non_puncts
)
),
maybe(punct)
)
)
re_substitutes(txt, re, "\\U\\1\\E\\2", global = TRUE)
#>[1] "This is just a test! I'm not sure if this is O.K. Or if it will work? Who knows. Regex is sorta new to me.. There are certain cases that I may not figure out?? Sad! ^_^"