如何搜索字符串列表中包含的字符串部分,并在R中返回匹配的字符串
以下数据框包含一个“Campaign”列,列的值包含有关季节、名称和位置的信息,但是,这些信息在每行中的顺序不同。幸运的是,这些信息是一个固定列表,因此我们可以创建一个向量来匹配“Campaign_name”列中的字符串如何搜索字符串列表中包含的字符串部分,并在R中返回匹配的字符串,r,R,以下数据框包含一个“Campaign”列,列的值包含有关季节、名称和位置的信息,但是,这些信息在每行中的顺序不同。幸运的是,这些信息是一个固定列表,因此我们可以创建一个向量来匹配“Campaign_name”列中的字符串 Date Campaign 1 Jan-15 Summer|Peter|Up 2 Feb-15 David|Winter|Down 3 Mar-15 Up|Peter|Spring 这是我想做的,我想创建3列,如姓名、季节、职位。因此,这些列
Date Campaign
1 Jan-15 Summer|Peter|Up
2 Feb-15 David|Winter|Down
3 Mar-15 Up|Peter|Spring
这是我想做的,我想创建3列,如姓名、季节、职位。因此,这些列可以搜索活动列中的字符串,并从下面的列表中返回匹配的值
Name <- c("Peter, David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")
请执行以下操作:
Date = c("Jan-15","Feb-15","Mar-15")
Campaign = c("Summer|Peter|Up","David|Winter|Down","Up|Peter|Spring")
df = data.frame(Date,Campaign)
Name <- c("Peter", "David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")
for(k in Name){
df$Name[grepl(pattern = k, x = df$Campaign)] <- k
}
for(k in Season){
df$Season[grepl(pattern = k, x = df$Campaign)] <- k
}
for(k in Position){
df$Position[grepl(pattern = k, x = df$Campaign)] <- k
}
另一种方式:
L <- strsplit(df$Campaign,split = '\\|')
df$Name <- sapply(L,intersect,Name)
df$Season <- sapply(L,intersect,Season)
df$Position <- sapply(L,intersect,Position)
L我的想法和马拉特·塔利波夫一样;这里有一个data.table选项:
library(data.table)
Name <- c("Peter", "David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")
dat <- data.table(Date=c("Jan-15", "Feb-15", "Mar-15"),
Campaign=c("Summer|Peter|Up", "David|Winter|Down", "Up|Peter|Spring"))
然后进行处理
dat[ , `:=`(Name = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
Season = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
]
结果:
> dat
Date Campaign Name Season Position
1: Jan-15 Summer|Peter|Up Peter Summer Up
2: Feb-15 David|Winter|Down David Winter Down
3: Mar-15 Up|Peter|Spring Peter Spring Up
如果对很多列执行此操作或需要就地修改(通过引用),可能会有一些好处
如果有人能告诉我如何一次更新所有三列,我很感兴趣
编辑:没关系,算了吧
for (icol in c("Name", "Season", "Position"))
dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]
查看data.table包中的?tstrsplit
。
dat[ , `:=`(Name = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
Season = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
]
> dat
Date Campaign Name Season Position
1: Jan-15 Summer|Peter|Up Peter Summer Up
2: Feb-15 David|Winter|Down David Winter Down
3: Mar-15 Up|Peter|Spring Peter Spring Up
for (icol in c("Name", "Season", "Position"))
dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]