R 部分总结两个数据帧_R - Fatal编程技术网

R 部分总结两个数据帧

R 部分总结两个数据帧,r,R,我有两个数据帧。对于df1的某些行，df2中有一个匹配的行。现在，应该对df1的某些列进行操作，以便它们包含自己的值和df2中的等效值之和在下面的示例中，列“count1”和“count2”应该求和，而不是列“type” df1 <- data.frame(id = c("one_a", "two_a", "three_a", "four_a"), type = c(8,7,6,5), count1 = c(1,2,1,NA), count2 = c(NA,0,1,0), id_df2

我有两个数据帧。对于df1的某些行，df2中有一个匹配的行。现在，应该对df1的某些列进行操作，以便它们包含自己的值和df2中的等效值之和

在下面的示例中，列“count1”和“count2”应该求和，而不是列“type”

df1 <- data.frame(id = c("one_a", "two_a", "three_a", "four_a"), type = c(8,7,6,5), count1 = c(1,2,1,NA), count2 = c(NA,0,1,0), id_df2 = c("one", "two", "three", "four"))
df2 <- data.frame(id = c("one", "two", "four"), type = c(8,7,5), count1 = c(0,1,1), count2 = c(0,0,1))
result <- data.frame(id = c("one_a", "two_a", "three_a", "four_a"), type = c(8,7,6,5), count1 = c(1,3,1,1), count2 = c(0,0,1,1))

> df1
       id type count1 count2 id_df2
1   one_a    8      1     NA     one
2   two_a    7      2      0     two
3 three_a    6      1      1   three
4  four_a    5     NA      0    four

> df2
    id type count1 count2
1  one    8      0      0
2  two    7      1      0
3 four    5      1      1

> result
       id type count1 count2
1   one_a    8      1      0
2   two_a    7      3      0
3 three_a    6      1      1
4  four_a    5      1      1

df1结果
id类型count1 count2
1 1_8 1 0
2 2_7 3 0
3三个a 6 1 1
4 4 4 a 5 1 1

也有类似的问题，我试图找到一个解决方案，将数据帧分开，然后合并。我只是想知道是否有更优雅的方式来做到这一点。我的原始数据集大约有300列，所以我正在寻找一个可伸缩的解决方案

提前谢谢 chuckmorris

你可以做：

library(dplyr)

df1 %>% select(-id_df2) %>%
  bind_rows(df2) %>%
  mutate(id = gsub("_.*", "", id)) %>%
  replace(., is.na(.), 0) %>%
  group_by(id, type) %>%
  summarise_at(vars(contains("count")), funs(sum))

其中输出为：

# A tibble: 4 x 4
# Groups:   id [?]
  id     type count1 count2
  <chr> <dbl>  <dbl>  <dbl>
1 four      5      1      1
2 one       8      1      0
3 three     6      1      1
4 two       7      3      0

如果您有兴趣保留

部件
另一种方法是使用连接，转换为long，然后向后扩展，如：
library(tidyverse)

df1 %>% 
  left_join(df2, by = c("id_df2" = "id")) %>%
  gather(var, val, -id) %>%
  mutate(var = gsub("\\..*", "", var)) %>%
  distinct(id, var, val) %>%
  filter(!var == "id_df2") %>%
  group_by(id, var) %>%
  summarise(val = sum(as.numeric(val), na.rm = T)) %>%
  spread(var, val) 

给予：
# A tibble: 4 x 4
# Groups:   id [4]
  id      count1 count2  type
  <fct>    <dbl>  <dbl> <dbl>
1 four_a       1      1     5
2 one_a        1      0     8
3 three_a      1      1     6
4 two_a        3      0     7

#一个tible:4 x 4
#组别:id[4]
id count1 count2类型
一四零一一五
2 1_1 0 8
3三个a 11 6
4 2_3 0 7

如果\u a
结尾有特殊用途，例如，也有带有\u b
、\u c
等的组（在这种情况下，上述方法将失败）。
稍微不那么优雅，但仍然有效：
result_2 <- df2 %>% 
  mutate(id = paste0(id, "_a")) %>%
  bind_rows(df1) %>% 
  select(-id_df2) %>% 
  replace(., is.na(.), 0) %>%
  group_by(id) %>% 
  summarise(count1 = sum(count1), count2 = sum(count2), type = max(type)) %>% 
  mutate(id_df2 = as.factor(id)) %>% 
  select(c(id_df2, type, count1, count2), -id)

结果2%
变异（id=0（id，“_a”））%>%
绑定_行（df1）%>%
选择（-id\u df2）%>%
替换（，is.na（.），0）%>%
分组依据（id）%>%
汇总（count1=总和（count1），count2=总和（count2），type=最大值（type））%>%
突变（id_df2=as.factor（id））%>%
选择（c（id\U df2，类型，计数1，计数2），-id）
我是否可以使用“id\u df2”列进行此操作？-原始数据集上的一些“type”列在df1和df2中包含不同的值-“id”字段最初看起来像“thr_a_ee”，在文章末尾，已经添加了一种可能的方法。
result_2 <- df2 %>% 
  mutate(id = paste0(id, "_a")) %>%
  bind_rows(df1) %>% 
  select(-id_df2) %>% 
  replace(., is.na(.), 0) %>%
  group_by(id) %>% 
  summarise(count1 = sum(count1), count2 = sum(count2), type = max(type)) %>% 
  mutate(id_df2 = as.factor(id)) %>% 
  select(c(id_df2, type, count1, count2), -id)