R 如何根据表信息添加和扩展数据表
我有以下数据表:R 如何根据表信息添加和扩展数据表,r,data.table,tidyverse,R,Data.table,Tidyverse,我有以下数据表: RowID| Col1 | Col2 | ---------------------- 1 | apple | cow | 2 | orange | dog | 3 | apple | cat | 4 | cherry | fish | 5 | cherry | ant | 6 | apple | rat | 我想去这张桌子: RowID| Col1 | Col2 | newCol -----------------
RowID| Col1 | Col2 |
----------------------
1 | apple | cow |
2 | orange | dog |
3 | apple | cat |
4 | cherry | fish |
5 | cherry | ant |
6 | apple | rat |
我想去这张桌子:
RowID| Col1 | Col2 | newCol
------------------------------
1 | apple | cow | cat
2 | apple | cow | rat
3 | orange | dog | na
4 | apple | cat | cow
5 | apple | cat | rat
6 | cherry | fish | ant
7 | cherry | ant | fish
8 | apple | rat | cow
9 | apple | rat | cat
为了帮助可视化上表的逻辑,它基本上与下表相同,但列表列根据当前的值被拆分为行。它与col1中的值相匹配,因此,例如,原始表的第13行和第6行在第一列中有“apple”。因此,新的“list”列将包括匹配行的所有Col2值。然后,为每个列表元素展开一个新行。上面的第二个表是我想要的结果,第三个表只是用来帮助可视化这些值的来源
RowID| Col1 | Col2 | newCol
------------------------------
1 | apple | cow | cat,rat (Row 3 & 6 match col1 values)
2 | orange | dog | na (No rows match this col1 value)
3 | apple | cat | cow,rat (Row 1 & 6 match col1 values)
4 | cherry | fish | ant (Row 5 matches col1 values)
5 | cherry | ant | fish (Row 4 matches col1 values)
6 | apple | rat | cow,cat (Row 1 & 3 match col1 values)
使用软件包:
其中:
相当于:
library(dplyr)
library(tidyr)
dat %>%
group_by(Col1) %>%
mutate(newCol = paste0(Col2, collapse = ",")) %>%
separate_rows(newCol) %>%
group_by(RowID) %>%
filter(Col2 != newCol | n() == 1)
使用软件包:
其中:
相当于:
library(dplyr)
library(tidyr)
dat %>%
group_by(Col1) %>%
mutate(newCol = paste0(Col2, collapse = ",")) %>%
separate_rows(newCol) %>%
group_by(RowID) %>%
filter(Col2 != newCol | n() == 1)
自联接第一列上的表,去掉NewCol等于Col2的行。难点在于保留data.table中只出现一次的行
require(data.table)
require(magrittr)
dt_foo = data.table(Col1 = c("apple", "orange","apple","cherry",
"cherry", "apple"),
Col2 = c("cow","dog","cat","fish",
"ant","rat"))
# required to later set NA values
single_occ = dt_foo[, .N, Col1] %>%
.[N == 1, Col1]
dt_foo2 = dt_foo %>%
.[., on = "Col1", allow.cartesian = T] %>%
setnames("i.Col2", "NewCol") %>%
.[Col1 %in% single_occ, NewCol := NA] %>%
.[Col2 != NewCol | is.na(NewCol)]
自联接第一列上的表,去掉NewCol等于Col2的行。难点在于保留data.table中只出现一次的行
require(data.table)
require(magrittr)
dt_foo = data.table(Col1 = c("apple", "orange","apple","cherry",
"cherry", "apple"),
Col2 = c("cow","dog","cat","fish",
"ant","rat"))
# required to later set NA values
single_occ = dt_foo[, .N, Col1] %>%
.[N == 1, Col1]
dt_foo2 = dt_foo %>%
.[., on = "Col1", allow.cartesian = T] %>%
setnames("i.Col2", "NewCol") %>%
.[Col1 %in% single_occ, NewCol := NA] %>%
.[Col2 != NewCol | is.na(NewCol)]