R使用不同的列追加2个数据帧
我想将dfToAdd附加到df,其中第一个缺少列。重要的细节是df有两种类型的列。第一组列相互关联。 e、 g.组=A表示名称=组A,颜色=蓝色。不可能有a组a-Red的组合。 第二类列之间相互关联。 动物=狗的动作=吠叫 我想添加第二个数据框,其中缺少第一类列中的列。这些列应该由第一类列的组合填充,如以下dfResult行的顺序无关紧要:R使用不同的列追加2个数据帧,r,dataframe,tidyr,rbind,R,Dataframe,Tidyr,Rbind,我想将dfToAdd附加到df,其中第一个缺少列。重要的细节是df有两种类型的列。第一组列相互关联。 e、 g.组=A表示名称=组A,颜色=蓝色。不可能有a组a-Red的组合。 第二类列之间相互关联。 动物=狗的动作=吠叫 我想添加第二个数据框,其中缺少第一类列中的列。这些列应该由第一类列的组合填充,如以下dfResult行的顺序无关紧要: df = data.frame(group = c("A", "A", "A", "B", "B", "B"), name
df = data.frame(group = c("A", "A", "A", "B", "B", "B"),
name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B"),
color = c("Blue", "Blue", "Blue", "Red", "Red", "Red"),
animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse"),
action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak")
)
dfToAdd = data.frame(animal = c("Lion", "Bird"),
action = c("Roar", "Chirp"))
dfResult = data.frame(group = c("A", "A", "A", "B", "B", "B", "A", "A", "B", "B"),
name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B", "Group A", "Group A", "Group B", "Group B"),
color = c("Blue", "Blue", "Blue", "Red", "Red", "Red", "Blue", "Blue", "Red", "Red"),
animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse", "Lion", "Bird", "Lion", "Bird"),
action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak", "Roar", "Chirp", "Roar", "Chirp"))
> df
group name color animal action
1 A Group A Blue Dog Bark
2 A Group A Blue Cat Meow
3 A Group A Blue Mouse Squeak
4 B Group B Red Dog Bark
5 B Group B Red Cat Meow
6 B Group B Red Mouse Squeak
> dfToAdd
animal action
1 Lion Roar
2 Bird Chirp
> dfResult
group name color animal action
1 A Group A Blue Dog Bark
2 A Group A Blue Cat Meow
3 A Group A Blue Mouse Squeak
4 B Group B Red Dog Bark
5 B Group B Red Cat Meow
6 B Group B Red Mouse Squeak
7 A Group A Blue Lion Roar
8 A Group A Blue Bird Chirp
9 B Group B Red Lion Roar
10 B Group B Red Bird Chirp
但是第一种类型的列组、名称、颜色还不完全清楚。我正在处理任意数量的多个分组变量。您可以想象,可能存在也可能不存在描述列=a组是一个好组或日期=2020.04.13。我们只知道第二种类型的列:animal和action。在写这篇文章时,我想到在tidyr的[complete][2]功能的两侧使用[nesting][1],手动检测缺失的列。也许有一种更优雅的解决方案:
# First find all grouping columns
groupCols = colnames(df)[!(colnames(df) %in% colnames(dfToAdd))]
otherCols = colnames(df)[colnames(df) %in% colnames(dfToAdd)]
# Populate missing columns with first grouping appearing in the df
dfToAdd[groupCols] = df[1, groupCols]
# rbind it to append
dfResult = rbind(df, dfToAdd)
# Now we have obvious missing combinations, tidyr::complete accepts nesting information to generate combinations only for those, which needs to be different.
dfResult %>% tidyr::complete(tidyr::nesting(!!! syms(otherCols)), tidyr::nesting(!!! syms(groupCols)))
编辑:实际上意识到我在结尾使用了未知的列名。这真的不管用。我需要将groupCols字符向量提供给第二个嵌套调用
edit2:现在多亏了akrun的回答,我也可以纠正这个问题。我们可以在单个%>%中完成这项工作,从“df”中切片第一行,选择“dftoad”中没有的列,用“dftoad”绑定,然后用“df”绑定行,并使用complete
我已经更新了关于使用未知列名问题的答案。它需要是动态的。我真的不知道团体,名字,颜色。但我可以把它们放在一个字符向量中。@Genom。你们想改变哪种筑巢方式?我了解动物和行为。所以筑巢动物,行动是好的。但是,嵌套组、名称、颜色可以更多。如嵌套组、名称、颜色、描述、日期、位置。。。。实际上,dfToAdd中所有缺少的列都被映射到df。我得到了一个错误:参数2的长度必须是1,而不是bind_cols处的2。我接受了,因为这是最简单的答案。我正在以sym为母语更新我的答案。谢谢!
library(dplyr)
library(tidyr)
library(rlang)
library(purrr)
df %>%
slice(1) %>%
select(-names(dfToAdd)) %>%
uncount(nrow(dfToAdd)) %>%
bind_cols(dfToAdd) %>%
bind_rows(df, .) %>%
complete(nesting(!!! syms(names(dfToAdd))),
nesting(!!! syms(setdiff(names(.), names(dfToAdd)))))
# A tibble: 10 x 5
# animal action group name color
# * <fct> <fct> <fct> <fct> <fct>
# 1 Cat Meow A Group A Blue
# 2 Cat Meow B Group B Red
# 3 Dog Bark A Group A Blue
# 4 Dog Bark B Group B Red
# 5 Mouse Squeak A Group A Blue
# 6 Mouse Squeak B Group B Red
# 7 Bird Chirp A Group A Blue
# 8 Bird Chirp B Group B Red
# 9 Lion Roar A Group A Blue
#10 Lion Roar B Group B Red