R 如果字符串中有某种模式,如何替换/重命名行?

R 如果字符串中有某种模式,如何替换/重命名行?,r,R,我想转换一些与唯一用户ID对应的链接: df<- data.frame( employeeId = c(1,2,3,4,5,6), linkToEmployee = c("http://intranet.homepageEmploye.com/herSalary", "http://intranet.homepageEmploye.org/herSalary/Details",

我想转换一些与唯一用户ID对应的链接:

    df<- data.frame(

      employeeId = c(1,2,3,4,5,6),
      linkToEmployee = c("http://intranet.homepageEmploye.com/herSalary",
                       "http://intranet.homepageEmploye.org/herSalary/Details",
                       "http://local.com/qa/for",
                       "here the homepage is missing",
                       "http://local.org/",
                       "here the homepage is missing"))


         employeeId                       linkToEmployee

    1          1         http://intranet.homepageEmploye.com/herSalary
    2          2 http://intranet.homepageEmploye.org/herSalary/Details
    3          3                               http://local.com/qa/for
    4          4                          here the homepage is missing
    5          5                                     http://local.org/
    6          6                          here the homepage is missing

然而,这并没有像预期的那样工作

实现这一点的一种方法是使用包
urltools
,它具有一些非常有用的URL解析功能。首先,您需要找出哪些确实是URL。为此,我搜索了包含TLD的字符串

library(urltools)

ind <- !is.na(suffix_extract(domain(df$linkToEmployee))$suffix)

df$linkToEmployee[ind] <- sapply(strsplit(domain(df$linkToEmployee[ind]), '\\.|\\s+'), 
                                      function(i) paste(head(i, 1), tail(i, 1), sep = '.'))

df$linkToEmployee[!ind] <- gsub('\\s+.*', '', df$linkToEmployee[!ind])

df
#  employeeId linkToEmployee
#1          1   intranet.com
#2          2   intranet.org
#3          3      local.com
#4          4           here
#5          5      local.org
#6          6           here

特殊字符:。和/或将搞乱你的gsub。如果您添加选项
fixed=TRUE
来阻止它将其解释为正则表达式字符串,那么它应该会起作用。您想要的结果是什么?你的例子并没有真正说明这一点,因为它不能概括。无论如何,您可能应该使用适当的URI解析器,而不是特殊的正则表达式。
    df$linkToEmployee <- gsub("http://intranet.homepageEmploye.com/", "intranet.com.", df$linkToEmployee)
library(urltools)

ind <- !is.na(suffix_extract(domain(df$linkToEmployee))$suffix)

df$linkToEmployee[ind] <- sapply(strsplit(domain(df$linkToEmployee[ind]), '\\.|\\s+'), 
                                      function(i) paste(head(i, 1), tail(i, 1), sep = '.'))

df$linkToEmployee[!ind] <- gsub('\\s+.*', '', df$linkToEmployee[!ind])

df
#  employeeId linkToEmployee
#1          1   intranet.com
#2          2   intranet.org
#3          3      local.com
#4          4           here
#5          5      local.org
#6          6           here
df$linkToEmployee <- as.character(df$linkToEmployee)