R 如何使用多列作为不同的字符串条件执行联接?
我想执行一个复杂的联接,它将多个列视为不同类型的条件 我想根据每个水果是否包含字符串、可能包含的字符串以及不包含的字符串,为每个水果分配一个类别 我有一个水果向量:R 如何使用多列作为不同的字符串条件执行联接?,r,join,sqldf,R,Join,Sqldf,我想执行一个复杂的联接,它将多个列视为不同类型的条件 我想根据每个水果是否包含字符串、可能包含的字符串以及不包含的字符串,为每个水果分配一个类别 我有一个水果向量: head(fruit) [1] "apple" "apricot" "avocado" "banana" "bell pepper" "bilberry" 每种水果的分配标准如下: fruitAssignment <- data.frame(assignment = c('Appl
head(fruit)
[1] "apple" "apricot" "avocado" "banana" "bell pepper" "bilberry"
每种水果的分配标准如下:
fruitAssignment <- data.frame(assignment = c('Apple','Berry','Black','Melon','Melon','Melon','Currant'),
contains = c('apple','berry','black','honeydew','melon','cantaloupe','currant'),
mayContain = c(NA,'black',NA,NA,NA,NA,NA),
doesNotContain = c(NA,NA,'berry',NA,NA,NA,NA))
assignment contains mayContain doesNotContain
1 Apple apple <NA> <NA>
2 Berry berry black <NA>
3 Black black <NA> berry
4 Melon honeydew <NA> <NA>
5 Melon melon <NA> <NA>
6 Melon cantaloupe <NA> <NA>
7 Currant currant <NA> <NA>
无论使用什么包来实现这一点都很好。我认为这里不适合使用连接,它更像是一项分类任务。使用正则表达式查找搜索词和分类表之间的匹配项:
fruit <- c("redcurrant", "blackcurrant", "pineapple", "blackberry", "coconut")
fruitAssignment <- data.frame(assignment = c('Apple','Berry','Black','Melon','Melon','Melon','Currant'),
contains = c('apple','berry','black','honeydew','melon','cantaloupe','currant'),
mayContain = c(NA,'black',NA,NA,NA,NA,NA),
doesNotContain = c(NA,NA,'berry',NA,NA,NA,NA),
stringsAsFactors = FALSE)
library(dplyr)
library(tibble)
fun <- function(fruit, fruitAssignment) {
fruitAssignment[,2:4] <- apply(fruitAssignment[,2:4],
2,
function(x, fruit) sapply(x, grepl, fruit, ignore.case = TRUE),
fruit = fruit)
fruitAssignment[is.na(fruitAssignment)] <- FALSE
x <- fruitAssignment %>%
filter(!doesNotContain, contains | mayContain)
if (nrow(x) == 1)
return(x$assignment)
"Fruit"
}
sapply(fruit, fun, fruitAssignment) %>%
enframe() %>%
setNames(c("fruit", "assignment"))
# A tibble: 5 x 2
fruit assignment
<chr> <chr>
1 redcurrant Currant
2 blackcurrant Fruit
3 pineapple Apple
4 blackberry Berry
5 coconut Fruit
水果、苹果不会出现在水果赋值中,列赋值的值以大写字母开头。请正确指定您希望包含正确样本输出的输出。我只需要将水果分配给不区分大小写的分配的标准。如果你需要更多的澄清,请告诉我。
fruit <- c("redcurrant", "blackcurrant", "pineapple", "blackberry", "coconut")
fruitAssignment <- data.frame(assignment = c('Apple','Berry','Black','Melon','Melon','Melon','Currant'),
contains = c('apple','berry','black','honeydew','melon','cantaloupe','currant'),
mayContain = c(NA,'black',NA,NA,NA,NA,NA),
doesNotContain = c(NA,NA,'berry',NA,NA,NA,NA),
stringsAsFactors = FALSE)
library(dplyr)
library(tibble)
fun <- function(fruit, fruitAssignment) {
fruitAssignment[,2:4] <- apply(fruitAssignment[,2:4],
2,
function(x, fruit) sapply(x, grepl, fruit, ignore.case = TRUE),
fruit = fruit)
fruitAssignment[is.na(fruitAssignment)] <- FALSE
x <- fruitAssignment %>%
filter(!doesNotContain, contains | mayContain)
if (nrow(x) == 1)
return(x$assignment)
"Fruit"
}
sapply(fruit, fun, fruitAssignment) %>%
enframe() %>%
setNames(c("fruit", "assignment"))
# A tibble: 5 x 2
fruit assignment
<chr> <chr>
1 redcurrant Currant
2 blackcurrant Fruit
3 pineapple Apple
4 blackberry Berry
5 coconut Fruit