根据R中的匹配条件合并行中的值_R_Dataframe

根据R中的匹配条件合并行中的值

r dataframe

根据R中的匹配条件合并行中的值,r,dataframe,R,Dataframe,我有一个关于在R中聚合值的简单问题假设我有一个数据帧： DF <- data.frame(col1=c("Type 1", "Type 1B", "Type 2"), col2=c(1, 2, 3)) 我注意到数据中有type1和type1b，因此我想将type1b组合成type1 所以我决定使用dplyr： filter(DF, col1=='Type 1' | col1=='Type 1B') %>% summarise(n = sum(col2)) 但现在我需要继

我有一个关于在R中聚合值的简单问题

假设我有一个数据帧：

DF <- data.frame(col1=c("Type 1", "Type 1B", "Type 2"), col2=c(1, 2, 3))

我注意到数据中有

type1

和

type1b

，因此我想将

type1b

组合成

type1

所以我决定使用

dplyr

：

filter(DF, col1=='Type 1' | col1=='Type 1B') %>%
  summarise(n = sum(col2))

但现在我需要继续：

DF2 <- data.frame('Type 1', filter(DF, col1=='Type 1' | col1=='Type 1B') %>%
  summarise(n = sum(col2)))

好的，现在我可以开始了：

rbind(DF2, DF[3,])

结果如何？它起作用了

   col1 col2
1 Type 1    3
3 Type 2    3

……但是啊！太可怕了！必须有更好的方法来简单地组合值。

您可以尝试：

library(data.table)

setDT(transform(DF, col1=gsub("(.*)[A-Z]+$","\\1",DF$col1)))[,list(col2=sum(col2)),col1]

#      col1 col2
# 1: Type 1    3
# 2: Type 2    3

或者更直接地说：

setDT(DF)[, .(col2 = sum(col2)), by = .(col1 = sub("[[:alpha:]]+$", "", col1))]

以下是一种可能的dplyr方法：

library(dplyr)
DF %>%
  group_by(col1 = sub("(.*\\d+).*$", "\\1", col1)) %>%
  summarise(col2 = sum(col2))
#Source: local data frame [2 x 2]
#
#    col1 col2
#1 Type 1    3
#2 Type 2    3

将

sub（）

与

aggregate（）

一起使用，从

col1

末尾删除除数字以外的任何内容

do.call("data.frame", 
    aggregate(col2 ~ cbind(col1 = sub("\\D+$", "", col1)), DF, sum)
)
#     col1 col2
# 1 Type 1    3
# 2 Type 2    3

那里有

do.call（）

包装器，因此

aggregate（）

之后的第一列正确地从矩阵更改为向量。这样以后就不会有任何意外了。

在我看来，

aggregate（）

是实现这一目的的完美函数，但您不必进行任何文本处理（例如

gsub（）

）。我将分两步进行：

用新的所需分组覆盖

col1

使用新的

col1

计算聚合以指定分组

DF$col1必须有一种更通用的方法来做这么简单的事情，不是吗？当然，这样一个简单的操作不应该包含正则表达式类型匹配！！现在排成一行。但是您需要gsub
或找到另一种模式来识别type 1
和type1B
是相似的。聚合可以通过aggregate
、dplyr
、data.table等来完成。我认为这是最好的答案，因为它避免了与文本的混搭。它保持了较低的复杂性。
library(dplyr)
DF %>%
  group_by(col1 = sub("(.*\\d+).*$", "\\1", col1)) %>%
  summarise(col2 = sum(col2))
#Source: local data frame [2 x 2]
#
#    col1 col2
#1 Type 1    3
#2 Type 2    3

do.call("data.frame", 
    aggregate(col2 ~ cbind(col1 = sub("\\D+$", "", col1)), DF, sum)
)
#     col1 col2
# 1 Type 1    3
# 2 Type 2    3

DF$col1 <- ifelse(DF$col1 %in% c('Type 1','Type 1B'),'Type 1',levels(DF$col1));
DF;
##     col1 col2
## 1 Type 1    1
## 2 Type 1    2
## 3 Type 2    3
DF <- aggregate(col2~col1, DF, FUN=sum );
DF;
##     col1 col2
## 1 Type 1    3
## 2 Type 2    3