有没有一种方法可以将R中一列中的一个字符串与另一列中的两个字符串精确匹配?

有没有一种方法可以将R中一列中的一个字符串与另一列中的两个字符串精确匹配?,r,string,dataframe,match,grepl,R,String,Dataframe,Match,Grepl,我想将R中一列中的字符串与另一列中由“,”逗号分隔的字符串进行匹配 我在R中有两个数据帧: General_df Main_cat gen_cat Fruits apple Fruits mango Fruits strawberry Vegetable potato Vegetable lettuce Vegetable onion Liquids water Liquids milk Liquids juice Tech app

我想将R中一列中的字符串与另一列中由“,”逗号分隔的字符串进行匹配

我在R中有两个数据帧:

General_df
Main_cat   gen_cat
Fruits     apple
Fruits     mango
Fruits     strawberry
Vegetable  potato
Vegetable  lettuce
Vegetable  onion
Liquids    water
Liquids    milk
Liquids    juice
Tech       app
Object     straw


My_dataframe

Days      cat
Day 1     apple, potato, milk
Day 2     onion, water
Day 3     strawberry, potato
Day 4     straw, mango
我想为“My_dataframe”获取主_cat,因此我成功地获得了:

Days      cat                    Match_string Main_cat

Day 1     apple, potato, milk    apple        Fruits
Day 1     apple, potato, milk    potato       Vegetable
Day 1     apple, potato, milk    app          Tech
Day 1     apple, potato, milk    milk         Liquids
它也与子字符串“app”匹配,我的数据框中有多行的子字符串匹配

但是,我只希望它与“cat”列中由“,”分隔的整个字符串完全匹配

有没有办法在这个场景中找到精确匹配的字符串?谢谢

General_df <- read.table(text='
Main_cat   gen_cat
Fruits     apple
Fruits     mango
Fruits     strawberry
Vegetable  potato
Vegetable  lettuce
Vegetable  onion
Liquids    water
Liquids    milk
Liquids    juice
Tech       app
Object     straw', header=TRUE, stringsAsFactors = FALSE)


My_dataframe <- read.table(text='
Days;    cat
Day 1;    apple, potato, milk
Day 2;    onion, water
Day 3;    strawberry, potato
Day 4 ;   straw, mango', sep=';', header=TRUE, stringsAsFactors = FALSE)

My_dataframe[] <- lapply(My_dataframe, trimws)

General\u df我想这就是你想要的:

library(dplyr); library(tidyr)

My_dataframe %>%
    ## Split cat variable up into individual strings as a list column
    mutate(Match_string = strsplit(cat, ',\\s+')) %>%
    ## unnest the list into a long/tall data frame
    unnest(Match_string) %>%
    ## Join the lookup/key onto the tall/long data on the split column
    left_join(General_df, by = c('Match_string' = 'gen_cat'))


##   Days  cat                 Match_string Main_cat 
##   <chr> <chr>               <chr>        <chr>    
## 1 Day 1 apple, potato, milk apple        Fruits   
## 2 Day 1 apple, potato, milk potato       Vegetable
## 3 Day 1 apple, potato, milk milk         Liquids  
## 4 Day 2 onion, water        onion        Vegetable
## 5 Day 2 onion, water        water        Liquids  
## 6 Day 3 strawberry, potato  strawberry   Fruits   
## 7 Day 3 strawberry, potato  potato       Vegetable
## 8 Day 4 straw, mango        straw        Object   
## 9 Day 4 straw, mango        mango        Fruits   

有一段时间了是的:-)2019年的第一个答案?欢迎回来!另请参见
dplyr::separate_rows
,它将在一个步骤中执行前两个操作。
fuzzyjoin::regex_internal_join
将在一个步骤中工作,但效率和健壮性不如公认的答案
library(dplyr); library(tidyr)

My_dataframe %>%
    ## Split cat variable up into individual strings as a list column
    mutate(Match_string = strsplit(cat, ',\\s+')) %>%
    ## unnest the list into a long/tall data frame
    unnest(Match_string) %>%
    ## Join the lookup/key onto the tall/long data on the split column
    left_join(General_df, by = c('Match_string' = 'gen_cat'))


##   Days  cat                 Match_string Main_cat 
##   <chr> <chr>               <chr>        <chr>    
## 1 Day 1 apple, potato, milk apple        Fruits   
## 2 Day 1 apple, potato, milk potato       Vegetable
## 3 Day 1 apple, potato, milk milk         Liquids  
## 4 Day 2 onion, water        onion        Vegetable
## 5 Day 2 onion, water        water        Liquids  
## 6 Day 3 strawberry, potato  strawberry   Fruits   
## 7 Day 3 strawberry, potato  potato       Vegetable
## 8 Day 4 straw, mango        straw        Object   
## 9 Day 4 straw, mango        mango        Fruits   
Match_string <- strsplit(My_dataframe$cat, ',\\s+')
data.frame(
    My_dataframe[rep(seq_len(nrow(My_dataframe)), lengths(Match_string)),],
    Match_string = unlist(Match_string), 
    Main_cat = General_df$Main_cat[match(unlist(Match_string), General_df$gen_cat)],
    stringsAsFactors = FALSE,
    row.names = NULL
)

##    Days                 cat Match_string  Main_cat
## 1 Day 1 apple, potato, milk        apple    Fruits
## 2 Day 1 apple, potato, milk       potato Vegetable
## 3 Day 1 apple, potato, milk         milk   Liquids
## 4 Day 2        onion, water        onion Vegetable
## 5 Day 2        onion, water        water   Liquids
## 6 Day 3  strawberry, potato   strawberry    Fruits
## 7 Day 3  strawberry, potato       potato Vegetable
## 8 Day 4        straw, mango        straw    Object
## 9 Day 4        straw, mango        mango    Fruits
library(data.table)
merge(
    data.table(My_dataframe)[, Match_string := strsplit(cat, ',\\s+')][, 
        .(Match_string =unlist(Match_string)), by = c('Days', 'cat')], 
    General_df, by.x = 'Match_string', by.y = 'gen_cat',
    all.x = TRUE
)[order(Days), .(Days, cat, Match_string, Main_cat)]

##     Days                 cat Match_string  Main_cat
## 1: Day 1 apple, potato, milk        apple    Fruits
## 2: Day 1 apple, potato, milk         milk   Liquids
## 3: Day 1 apple, potato, milk       potato Vegetable
## 4: Day 2        onion, water        onion Vegetable
## 5: Day 2        onion, water        water   Liquids
## 6: Day 3  strawberry, potato       potato Vegetable
## 7: Day 3  strawberry, potato   strawberry    Fruits
## 8: Day 4        straw, mango        mango    Fruits
## 9: Day 4        straw, mango        straw    Object