R 如何根据表信息添加和扩展数据表_R_Data.table_Tidyverse

R 如何根据表信息添加和扩展数据表

R 如何根据表信息添加和扩展数据表,r,data.table,tidyverse,R,Data.table,Tidyverse,我有以下数据表： RowID| Col1 | Col2 | ---------------------- 1 | apple | cow | 2 | orange | dog | 3 | apple | cat | 4 | cherry | fish | 5 | cherry | ant | 6 | apple | rat | 我想去这张桌子： RowID| Col1 | Col2 | newCol -----------------

我有以下数据表：

RowID| Col1   | Col2 |
----------------------
1    | apple  | cow  |
2    | orange | dog  |
3    | apple  | cat  |
4    | cherry | fish |
5    | cherry | ant  |
6    | apple  | rat  |

我想去这张桌子：

RowID| Col1   | Col2 | newCol
------------------------------
1    | apple  | cow  | cat
2    | apple  | cow  | rat   
3    | orange | dog  | na        
4    | apple  | cat  | cow
5    | apple  | cat  | rat   
6    | cherry | fish | ant       
7    | cherry | ant  | fish      
8    | apple  | rat  | cow
9    | apple  | rat  | cat

为了帮助可视化上表的逻辑，它基本上与下表相同，但列表列根据当前的值被拆分为行。它与col1中的值相匹配，因此，例如，原始表的第13行和第6行在第一列中有“apple”。因此，新的“list”列将包括匹配行的所有Col2值。然后，为每个列表元素展开一个新行。上面的第二个表是我想要的结果，第三个表只是用来帮助可视化这些值的来源

RowID| Col1   | Col2 | newCol
------------------------------
1    | apple  | cow  | cat,rat   (Row 3 & 6 match col1 values)
2    | orange | dog  | na        (No rows match this col1 value)
3    | apple  | cat  | cow,rat   (Row 1 & 6 match col1 values)
4    | cherry | fish | ant       (Row 5 matches col1 values)
5    | cherry | ant  | fish      (Row 4 matches col1 values)
6    | apple  | rat  | cow,cat   (Row 1 & 3 match col1 values)

使用软件包：

其中：

相当于：

library(dplyr)
library(tidyr)

dat %>% 
  group_by(Col1) %>% 
  mutate(newCol = paste0(Col2, collapse = ",")) %>% 
  separate_rows(newCol) %>% 
  group_by(RowID) %>% 
  filter(Col2 != newCol | n() == 1)

使用软件包：

其中：

相当于：

library(dplyr)
library(tidyr)

dat %>% 
  group_by(Col1) %>% 
  mutate(newCol = paste0(Col2, collapse = ",")) %>% 
  separate_rows(newCol) %>% 
  group_by(RowID) %>% 
  filter(Col2 != newCol | n() == 1)

自联接第一列上的表，去掉NewCol等于Col2的行。难点在于保留data.table中只出现一次的行

require(data.table)
require(magrittr)

dt_foo = data.table(Col1 = c("apple", "orange","apple","cherry",
                      "cherry", "apple"),
                    Col2 = c("cow","dog","cat","fish",
                      "ant","rat"))

# required to later set NA values
single_occ = dt_foo[, .N, Col1] %>% 
  .[N == 1, Col1]

dt_foo2 = dt_foo %>% 
  .[., on = "Col1", allow.cartesian = T] %>% 
  setnames("i.Col2", "NewCol") %>% 
  .[Col1 %in% single_occ, NewCol := NA] %>% 
  .[Col2 != NewCol | is.na(NewCol)]

自联接第一列上的表，去掉NewCol等于Col2的行。难点在于保留data.table中只出现一次的行

require(data.table)
require(magrittr)

dt_foo = data.table(Col1 = c("apple", "orange","apple","cherry",
                      "cherry", "apple"),
                    Col2 = c("cow","dog","cat","fish",
                      "ant","rat"))

# required to later set NA values
single_occ = dt_foo[, .N, Col1] %>% 
  .[N == 1, Col1]

dt_foo2 = dt_foo %>% 
  .[., on = "Col1", allow.cartesian = T] %>% 
  setnames("i.Col2", "NewCol") %>% 
  .[Col1 %in% single_occ, NewCol := NA] %>% 
  .[Col2 != NewCol | is.na(NewCol)]