R 如果字符串中有某种模式,如何替换/重命名行?
我想转换一些与唯一用户ID对应的链接:R 如果字符串中有某种模式,如何替换/重命名行?,r,R,我想转换一些与唯一用户ID对应的链接: df<- data.frame( employeeId = c(1,2,3,4,5,6), linkToEmployee = c("http://intranet.homepageEmploye.com/herSalary", "http://intranet.homepageEmploye.org/herSalary/Details",
df<- data.frame(
employeeId = c(1,2,3,4,5,6),
linkToEmployee = c("http://intranet.homepageEmploye.com/herSalary",
"http://intranet.homepageEmploye.org/herSalary/Details",
"http://local.com/qa/for",
"here the homepage is missing",
"http://local.org/",
"here the homepage is missing"))
employeeId linkToEmployee
1 1 http://intranet.homepageEmploye.com/herSalary
2 2 http://intranet.homepageEmploye.org/herSalary/Details
3 3 http://local.com/qa/for
4 4 here the homepage is missing
5 5 http://local.org/
6 6 here the homepage is missing
然而,这并没有像预期的那样工作实现这一点的一种方法是使用包
urltools
,它具有一些非常有用的URL解析功能。首先,您需要找出哪些确实是URL。为此,我搜索了包含TLD的字符串
library(urltools)
ind <- !is.na(suffix_extract(domain(df$linkToEmployee))$suffix)
df$linkToEmployee[ind] <- sapply(strsplit(domain(df$linkToEmployee[ind]), '\\.|\\s+'),
function(i) paste(head(i, 1), tail(i, 1), sep = '.'))
df$linkToEmployee[!ind] <- gsub('\\s+.*', '', df$linkToEmployee[!ind])
df
# employeeId linkToEmployee
#1 1 intranet.com
#2 2 intranet.org
#3 3 local.com
#4 4 here
#5 5 local.org
#6 6 here
特殊字符:。和/或将搞乱你的gsub。如果您添加选项
fixed=TRUE
来阻止它将其解释为正则表达式字符串,那么它应该会起作用。您想要的结果是什么?你的例子并没有真正说明这一点,因为它不能概括。无论如何,您可能应该使用适当的URI解析器,而不是特殊的正则表达式。
df$linkToEmployee <- gsub("http://intranet.homepageEmploye.com/", "intranet.com.", df$linkToEmployee)
library(urltools)
ind <- !is.na(suffix_extract(domain(df$linkToEmployee))$suffix)
df$linkToEmployee[ind] <- sapply(strsplit(domain(df$linkToEmployee[ind]), '\\.|\\s+'),
function(i) paste(head(i, 1), tail(i, 1), sep = '.'))
df$linkToEmployee[!ind] <- gsub('\\s+.*', '', df$linkToEmployee[!ind])
df
# employeeId linkToEmployee
#1 1 intranet.com
#2 2 intranet.org
#3 3 local.com
#4 4 here
#5 5 local.org
#6 6 here
df$linkToEmployee <- as.character(df$linkToEmployee)