在R中匹配列表中的产品

在R中匹配列表中的产品,r,R,我必须对以下产品列表进行分类: product_list<-data.frame(product=c('banana from ecuador 1 unit', 'argentinian meat (1 kg) cow','chicken breast','noodles','salad','chicken salad with egg')) product\u list也许这会有所帮助 q <- outer( strsplit(product_to_match, "

我必须对以下产品列表进行分类:

product_list<-data.frame(product=c('banana from ecuador 1 unit', 'argentinian meat (1 kg) cow','chicken breast','noodles','salad','chicken salad with egg'))

product\u list也许这会有所帮助

q <- outer(
  strsplit(product_to_match, "\\s+"),
  strsplit(product_list$product, "\\s+"),
  FUN = Vectorize(function(x, y) all(x %in% y))
)
product_list$class <- product_to_match[replace(colSums(q * row(q)), colSums(q) == 0, NA)]

q使用
stringdist
可以获得一些匹配项

library(fuzzyjoin)
stringdist_left_join(product_list, tibble(product = product_to_match), 
        method = 'soundex')

outer
是一个很好的选项,因为它允许检查您是否也获得了多个匹配项。此外,主题的变化-
产品匹配[max.col(cbind(outer)(strsplit(产品列表$product,\\s+”),strsplit(产品匹配,\\s+”),FUN=Vectorize(函数(x,y)all(y%in%x))),TRUE),“first”)
@thelatemail谢谢,这种紧凑的变型很酷。
q <- outer(
  strsplit(product_to_match, "\\s+"),
  strsplit(product_list$product, "\\s+"),
  FUN = Vectorize(function(x, y) all(x %in% y))
)
product_list$class <- product_to_match[replace(colSums(q * row(q)), colSums(q) == 0, NA)]
> product_list
                      product             class
1  banana from ecuador 1 unit              <NA>
2 argentinian meat (1 kg) cow          cow meat
3              chicken breast    chicken breast
4                     noodles              <NA>
5                       salad              <NA>
6      chicken salad with egg chicken egg salad
library(fuzzyjoin)
stringdist_left_join(product_list, tibble(product = product_to_match), 
        method = 'soundex')