R 将观察值从一个表匹配到另一个由字符串组成的表变量
我有两个叫做A和B的数据集R 将观察值从一个表匹配到另一个由字符串组成的表变量,r,stringr,R,Stringr,我有两个叫做A和B的数据集 library(data.table) Farm.Type <- c("Fruits","Vegetables","Livestock") Produce.All <- c("Apple, Orange, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken") Store <- c("Convenience","Wholesale","Grocery","
library(data.table)
Farm.Type <- c("Fruits","Vegetables","Livestock")
Produce.All <- c("Apple, Orange, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken")
Store <- c("Convenience","Wholesale","Grocery","Market")
Produce <- c("Oranges","Watermelon","Cabbage","Pig")
Farm <- c("Fruits","","Vegetables","Livestock")
A <- data.table(Farm.Type, Produce.All)
B <- data.table(Store, Produce)
库(data.table)
农场类型
否则。(您仍然需要将其添加到B数据帧)
与:
library(purrr)
library(dplyr)
library(tidyr)
mutate(A, Produce.All=stri_split_regex(Produce.All, ", ")) %>%
unnest(Produce.All) -> A_long
left_join(B, A_long, by=c("Produce"="Produce.All"))
我当然希望这不是家庭作业
否则。(您仍然需要将其添加到B数据帧)
与:
library(purrr)
library(dplyr)
library(tidyr)
mutate(A, Produce.All=stri_split_regex(Produce.All, ", ")) %>%
unnest(Produce.All) -> A_long
left_join(B, A_long, by=c("Produce"="Produce.All"))
而且,我当然希望这不是家庭作业。重复hrbrmstr的答案,但坚持使用数据。表和一些基本R:
longA <-
stack(
setNames(
strsplit(A[, Produce.All], ", "),
A[, Farm.Type]
)
)
merge(longA, B, by.x = "values", by.y = "Produce", all.y = TRUE)
# values ind Store
#1 Cabbage Vegetables Grocery
#2 Oranges <NA> Convenience
#3 Pig Livestock Market
#4 Watermelon <NA> Wholesale
# Or using a data.table merge, if you like
setDT(longA)[B, on = c(values = "Produce")]
longA重复hrbrmstr的答案,但坚持使用数据。表
和一些基本R:
longA <-
stack(
setNames(
strsplit(A[, Produce.All], ", "),
A[, Farm.Type]
)
)
merge(longA, B, by.x = "values", by.y = "Produce", all.y = TRUE)
# values ind Store
#1 Cabbage Vegetables Grocery
#2 Oranges <NA> Convenience
#3 Pig Livestock Market
#4 Watermelon <NA> Wholesale
# Or using a data.table merge, if you like
setDT(longA)[B, on = c(values = "Produce")]
longA为什么你不愿意更改表A的格式?嗨,我不是真的反对更改表A。但是,我很好奇,如果不经过转换表A的附加步骤,是否有可能的解决方案。为什么你不愿意更改表A的格式?嗨,我不是真的反对更改表A。但是,我很好奇,如果不经过转换表a的附加步骤,是否有可能的解决方案。(如果使用数据,则不是base R。table
:-)您应该回答这个问题@Jota。感谢您的帮助。是的,我看到它在处理字符串时会变得复杂。我曾想过使用某种形式的for或while循环,并将do与grep函数结合使用,但我发现转换数据更简单。(如果使用data.table
:-)您应该回答这个问题@Jota。感谢您的帮助。是的,我看到它在处理字符串时会变得复杂。我曾想过使用某种for或while循环,并将doing与grep函数结合使用,但我发现只转换数据是多么简单。
library(purrr)
library(dplyr)
library(tidyr)
mutate(A, Produce.All=stri_split_regex(Produce.All, ", ")) %>%
unnest(Produce.All) -> A_long
left_join(B, A_long, by=c("Produce"="Produce.All"))
longA <-
stack(
setNames(
strsplit(A[, Produce.All], ", "),
A[, Farm.Type]
)
)
merge(longA, B, by.x = "values", by.y = "Produce", all.y = TRUE)
# values ind Store
#1 Cabbage Vegetables Grocery
#2 Oranges <NA> Convenience
#3 Pig Livestock Market
#4 Watermelon <NA> Wholesale
# Or using a data.table merge, if you like
setDT(longA)[B, on = c(values = "Produce")]