Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/templates/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何搜索字符串列表中包含的字符串部分,并在R中返回匹配的字符串_R - Fatal编程技术网

如何搜索字符串列表中包含的字符串部分,并在R中返回匹配的字符串

如何搜索字符串列表中包含的字符串部分,并在R中返回匹配的字符串,r,R,以下数据框包含一个“Campaign”列,列的值包含有关季节、名称和位置的信息,但是,这些信息在每行中的顺序不同。幸运的是,这些信息是一个固定列表,因此我们可以创建一个向量来匹配“Campaign_name”列中的字符串 Date Campaign 1 Jan-15 Summer|Peter|Up 2 Feb-15 David|Winter|Down 3 Mar-15 Up|Peter|Spring 这是我想做的,我想创建3列,如姓名、季节、职位。因此,这些列

以下数据框包含一个“Campaign”列,列的值包含有关季节、名称和位置的信息,但是,这些信息在每行中的顺序不同。幸运的是,这些信息是一个固定列表,因此我们可以创建一个向量来匹配“Campaign_name”列中的字符串

   Date           Campaign
1 Jan-15   Summer|Peter|Up
2 Feb-15 David|Winter|Down
3 Mar-15   Up|Peter|Spring
这是我想做的,我想创建3列,如姓名、季节、职位。因此,这些列可以搜索活动列中的字符串,并从下面的列表中返回匹配的值

Name <- c("Peter, David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")
请执行以下操作:

Date = c("Jan-15","Feb-15","Mar-15")
Campaign = c("Summer|Peter|Up","David|Winter|Down","Up|Peter|Spring")
df = data.frame(Date,Campaign)

Name <- c("Peter", "David")
Season <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

for(k in Name){
    df$Name[grepl(pattern = k, x = df$Campaign)] <- k
}

for(k in Season){
    df$Season[grepl(pattern = k, x = df$Campaign)] <- k
}

for(k in Position){
    df$Position[grepl(pattern = k, x = df$Campaign)] <- k
}
另一种方式:

L <- strsplit(df$Campaign,split = '\\|')

df$Name <- sapply(L,intersect,Name)
df$Season <- sapply(L,intersect,Season)
df$Position <- sapply(L,intersect,Position)

L我的想法和马拉特·塔利波夫一样;这里有一个data.table选项:

library(data.table)

Name     <- c("Peter", "David")
Season   <- c("Summer","Spring","Autumn", "Winter")
Position <- c("Up","Down")

dat <- data.table(Date=c("Jan-15", "Feb-15", "Mar-15"),
                  Campaign=c("Summer|Peter|Up", "David|Winter|Down", "Up|Peter|Spring"))
然后进行处理

dat[ , `:=`(Name     = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
            Season   = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
            Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
    ]
结果:

> dat
     Date          Campaign  Name Season Position
1: Jan-15   Summer|Peter|Up Peter Summer       Up
2: Feb-15 David|Winter|Down David Winter     Down
3: Mar-15   Up|Peter|Spring Peter Spring       Up
如果对很多列执行此操作或需要就地修改(通过引用),可能会有一些好处

如果有人能告诉我如何一次更新所有三列,我很感兴趣

编辑:没关系,算了吧

for (icol in c("Name", "Season", "Position")) 
    dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]

查看data.table包中的
?tstrsplit
dat[ , `:=`(Name     = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Name),
            Season   = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Season),
            Position = sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, Position))
    ]
> dat
     Date          Campaign  Name Season Position
1: Jan-15   Summer|Peter|Up Peter Summer       Up
2: Feb-15 David|Winter|Down David Winter     Down
3: Mar-15   Up|Peter|Spring Peter Spring       Up
for (icol in c("Name", "Season", "Position")) 
    dat[, (icol):=sapply(strsplit(Campaign, "|", fixed=TRUE), intersect, get(icol))]