R 基于另一数据帧映射从不同数据帧提取数据

R 基于另一数据帧映射从不同数据帧提取数据,r,R,我试图根据R中的一些类别数据进行列表拆分 我有以下数据: # A tibble: 5 x 2 category to_split <chr> <chr> 1 cat12 c(1, 5) 2 cat22 c(2, 5, 1) 3 cat33 3 4 cat43 4 5 cat51 c(5, 2) 我想创建新的列表,以便c(1,5)从大数据帧中提取cat12和cat51数据。

我试图根据R中的一些类别数据进行列表拆分

我有以下数据:

# A tibble: 5 x 2
  category to_split  
  <chr>    <chr>     
1 cat12    c(1, 5)   
2 cat22    c(2, 5, 1)
3 cat33    3         
4 cat43    4         
5 cat51    c(5, 2)
我想创建新的列表,以便
c(1,5)
从大数据帧中提取
cat12
cat51
数据。此外,其中
c(2,5,1)
提取
cat22
cat51
cat12
数据,并将该数据存储在数据帧中(在列表中)

我想得到如下列表结构:

list(
     c(1, 5)  - a data frame containing the two corresponding categories of data
     c(2, 5, 1) - a data frame contained the three corresponding categories of data
     3
     4
     c(5, 2)
)
(我不在乎列表名是
c(1,5)…
等等)。我将命名数据来自的类别(在小数据框中),即

我试图通过较小数据帧中的映射从较大数据帧中提取相关数据

数据:


data\u join不清楚为什么在同一数据框中定义类别和拆分。如果每个拆分是五个类别的子集,则至少有一个类别存在31=2^5-1个可能拆分

#给每个类别一个id
猫%
选择(类别)%>%
变异(
.id=行号()
)
头(猫)
#>#tibble:5 x 2
#>类别id
#>       
#>1第12类1
#>2第22类2
#>3 cat33 3
#>4 cat43 4
#>5 cat51 5
#找出每个拆分/子集的类别(可能多于/少于5个?)
子集%
选择(要拆分)%>%
变异(.id=map(to_split,~eval(parse(text=))))%>%
未测试(.id)%%>%
内部连接(猫,通过=“.id”)
总目(子集)
#>#tibble:6 x 3
#>要拆分.id类别,请执行以下操作
#>             
#>1 c(1,5)1 cat12
#>2 c(1,5)5 cat51
#>3 c(2,5,1)2 cat22
#>4 c(2,5,1)5 cat51
#>5 c(2,5,1)1 cat12
#>6 3 cat33
#现在,您可以使用数据连接子集
df%
内螺纹联接(
子集,按=“类别”
)
#然后分组
拆分[1]“3”“4”“c(1,5)”“c(2,5,1)”“c(5,2)”
由(v0.3.0)于2019-11-01创建

list(
     c(1, 5)  - a data frame containing the two corresponding categories of data
     c(2, 5, 1) - a data frame contained the three corresponding categories of data
     3
     4
     c(5, 2)
)
list(
     cat12  - data frame containing the two corresponding categories
     cat22
     cat33
     cat43
     cat51
)
data_join <- structure(list(category = c("cat12", "cat22", "cat33", "cat43", 
"cat51"), to_split = c("c(1, 5)", "c(2, 5, 1)", "3", "4", "c(5, 2)"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L))


full_data <- structure(list(category = c("cat12", "cat12", "cat12", "cat12", 
"cat12", "cat12", "cat12", "cat12", "cat12", "cat12", "cat12", 
"cat12", "cat12", "cat12", "cat12", "cat12", "cat12", "cat12", 
"cat12", "cat12", "cat22", "cat22", "cat22", "cat22", "cat22", 
"cat22", "cat22", "cat22", "cat22", "cat22", "cat22", "cat22", 
"cat22", "cat22", "cat22", "cat22", "cat22", "cat22", "cat22", 
"cat22", "cat33", "cat33", "cat33", "cat33", "cat33", "cat33", 
"cat33", "cat33", "cat33", "cat33", "cat33", "cat33", "cat33", 
"cat33", "cat33", "cat33", "cat33", "cat33", "cat33", "cat33", 
"cat43", "cat43", "cat43", "cat43", "cat43", "cat43", "cat43", 
"cat43", "cat43", "cat43", "cat43", "cat43", "cat43", "cat43", 
"cat43", "cat43", "cat43", "cat43", "cat43", "cat43", "cat51", 
"cat51", "cat51", "cat51", "cat51", "cat51", "cat51", "cat51", 
"cat51", "cat51", "cat51", "cat51", "cat51", "cat51", "cat51", 
"cat51", "cat51", "cat51", "cat51", "cat51"), var1 = c(7, 20, 
12, 4, 13, 7, 9, 3, 8, 32, 5, 2, 14, 7, 11, 9, 25, 5, 6, 18, 
14, 12, 11, 11, 5, 7, 12, 2, 7, 7, 5, 28, 6, 8, 4, 9, 4, 11, 
6, 5, NA, NA, 24, 6, 6, 29, NA, 11, NA, NA, NA, 9, NA, 8, 7, 
NA, 17, 6, NA, 6, NA, NA, NA, NA, NA, NA, NA, NA, 13, NA, NA, 
NA, NA, 16, 7, 8, NA, NA, 10, 19, 6, 10, 3, 12, 2, 2, 7, 11, 
5, 5, 6, 3, 6, 9, 11, 11, 12, 5, 14, 5), var2 = c(0.4, 1.1, 0.4, 
0.3, 0.4, 0.3, 0.4, 0.3, 0.5, 2.6, 0.6, 0.3, 0.5, 0.4, 0.4, 0.7, 
0.5, 0.3, 0.4, 0.6, 0.5, 0.3, 0.4, 0.2, 0.4, 0.5, 0.5, 0.3, 0.4, 
0.3, 0.4, 1.1, 0.4, 0.5, 0.2, 0.5, 0.4, 0.5, 0.6, 0.6, NA, NA, 
0.7, 0.1, 0.3, 0.5, NA, 0.7, NA, NA, NA, 0.2, NA, 0.3, 0.2, NA, 
0.3, 0.3, NA, 0.1, 0.2, 0.2, 0.5, 0.4, 0.3, 0.4, 0.2, 0.4, 0.3, 
0.3, 0.2, 0.3, 0.2, 0.4, 0.2, 0.2, 0.3, 0.3, 0.5, 0.5, 0.4, 0.2, 
0.3, 0.7, 0.3, 0.1, 0.3, 0.3, 0.4, 0.6, 0.3, 0.2, 0.4, 0.6, 0.2, 
0.7, 0.6, 0.4, 0.6, 0.5), var3 = c(10, 155, 3, 38, 40, 17, 45, 
17, 84, 378, 44, 14, 36, 20, 17, 76, 25, 4, 22, 63, 42, 23, 12, 
10, 15, 29, 26, 7, 18, 5, 23, 204, 24, 56, 7, 35, 23, 55, 28, 
65, 10, 13, 54, 13, 22, 45, 29, 58, 49, 14, 2, 9, 15, 38, 41, 
63, 11, 9, 7, 20, 3, 5, 52, 7, 18, 25, 2, 30, 10, 3, 3, 13, 1, 
12, 7, 5, 5, 9, 13, 4, 14, 9, 8, 147, 5, 7, 2, 10, 6, 66, 2, 
8, 6, 3, 8, 5, 45, 6, 20, 27)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -100L), groups = structure(list(
    station_location = c("cat12", "cat22", "cat33", "cat43", 
    "cat51"), .rows = list(1:20, 21:40, 41:60, 61:80, 81:100)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))