R 如何按除部分变量外的所有变量分组,并为每个观察添加组id

R 如何按除部分变量外的所有变量分组,并为每个观察添加组id,r,dplyr,R,Dplyr,我有这样一个数据集: data(CO2, package = 'datasets') ## Plant Type Treatment conc uptake ## 1 Qn1 Quebec nonchilled 95 16.0 ## 2 Qn1 Quebec nonchilled 175 30.4 ## ... ## 17 Qn3 Quebec nonchilled 250 40.3 ## 18 Qn

我有这样一个数据集:

data(CO2, package = 'datasets')

##    Plant        Type  Treatment conc uptake
## 1    Qn1      Quebec nonchilled   95   16.0
## 2    Qn1      Quebec nonchilled  175   30.4
## ... 
## 17   Qn3      Quebec nonchilled  250   40.3
## 18   Qn3      Quebec nonchilled  350   42.1
## ...
## 27   Qc1      Quebec    chilled  675   35.4
## 28   Qc1      Quebec    chilled 1000   38.7
## ...
## 36   Qc3      Quebec    chilled   95   15.1
## 37   Qc3      Quebec    chilled  175   21.0
## ...
## 47   Mn1 Mississippi nonchilled  500   30.9
##  ...
## 53   Mn2 Mississippi nonchilled  350   31.8
## 54   Mn2 Mississippi nonchilled  500   32.4
## ...
## 62   Mn3 Mississippi nonchilled  675   28.1
## 63   Mn3 Mississippi nonchilled 1000   27.8
## ...
## 70   Mc1 Mississippi    chilled 1000   21.9
## 71   Mc2 Mississippi    chilled   95    7.7
## 72   Mc2 Mississippi    chilled  175   11.4
## ...
## 83   Mc3 Mississippi    chilled  675   18.9
## 84   Mc3 Mississippi    chilled 1000   19.9
A=aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"]#You can take either con or uptake 
transform(CO2,ID=rep(1:length(A),A))
   Plant        Type  Treatment conc uptake ID
1    Qn1      Quebec nonchilled   95   16.0  1
2    Qn1      Quebec nonchilled  175   30.4  1
      :
8    Qn2      Quebec nonchilled   95   13.6  2
9    Qn2      Quebec nonchilled  175   27.3  2
      :
15   Qn3      Quebec nonchilled   95   16.2  3
16   Qn3      Quebec nonchilled  175   32.4  3
  • 观察结果应根据组合进行分组 除
    conc
    和吸收之外的所有变量。所以我想指定 我不想用于分组的变量
  • 我想向数据集中添加一个新变量
    GroupID
    ,其中包含所有观测值 属于同一组的具有相同值的
    GroupID
我找到了一个有效的解决方案,但它是一个庞然大物:

library(dplyr)
CO2 %>% 
  mutate(GroupID=
         do.call( group_indices
                , c( list(.data=.)
                   , colnames(.) %>% 
                      setdiff(c('conc','uptake')) %>% 
                      as.name()
                   )
                )
         )

##    Plant        Type  Treatment conc uptake GroupID
## 1    Qn1      Quebec nonchilled   95   16.0       1
## 2    Qn1      Quebec nonchilled  175   30.4       1
## ...
## 8    Qn2      Quebec nonchilled   95   13.6       2
## 9    Qn2      Quebec nonchilled  175   27.3       2
## ...
## 15   Qn3      Quebec nonchilled   95   16.2       3
## 16   Qn3      Quebec nonchilled  175   32.4       3
## ...
## 22   Qc1      Quebec    chilled   95   14.2       4
## 23   Qc1      Quebec    chilled  175   24.1       4
## ...
## 29   Qc2      Quebec    chilled   95    9.3       6
## 30   Qc2      Quebec    chilled  175   27.3       6
## ...
## 36   Qc3      Quebec    chilled   95   15.1       5
## 37   Qc3      Quebec    chilled  175   21.0       5
## ...
## 43   Mn1 Mississippi nonchilled   95   10.6       9
## 44   Mn1 Mississippi nonchilled  175   19.2       9
## ...
有更简单的解决办法吗



额外好处:如果有一个解决方案可以使用相同类型的所有变量(例如所有因子变量)进行分组,那将是一个大爆炸。

可以使用
数据中的
.GRP

library(data.table)
setDT(CO2)[, GroupID := .GRP, setdiff(names(CO2), c('conc','uptake'))]

如果
根据条件对变量进行分组,我们可以使用
逐个分组。在这种情况下,
是。factor
用于评估列是否为因子。之后,
group\u index
可以为每个组生成ID

library(dplyr)

CO2_2 <- CO2 %>%
  mutate(GroupID = CO2 %>%
           group_by_if(is.factor) %>%
           group_indices())
head(CO2_2)
#   Plant   Type  Treatment conc uptake GroupID
# 1   Qn1 Quebec nonchilled   95   16.0       1
# 2   Qn1 Quebec nonchilled  175   30.4       1
# 3   Qn1 Quebec nonchilled  250   34.8       1
# 4   Qn1 Quebec nonchilled  350   37.2       1
# 5   Qn1 Quebec nonchilled  500   35.3       1
# 6   Qn1 Quebec nonchilled  675   39.2       1

使用base r,我们可以执行以下操作:

data(CO2, package = 'datasets')

##    Plant        Type  Treatment conc uptake
## 1    Qn1      Quebec nonchilled   95   16.0
## 2    Qn1      Quebec nonchilled  175   30.4
## ... 
## 17   Qn3      Quebec nonchilled  250   40.3
## 18   Qn3      Quebec nonchilled  350   42.1
## ...
## 27   Qc1      Quebec    chilled  675   35.4
## 28   Qc1      Quebec    chilled 1000   38.7
## ...
## 36   Qc3      Quebec    chilled   95   15.1
## 37   Qc3      Quebec    chilled  175   21.0
## ...
## 47   Mn1 Mississippi nonchilled  500   30.9
##  ...
## 53   Mn2 Mississippi nonchilled  350   31.8
## 54   Mn2 Mississippi nonchilled  500   32.4
## ...
## 62   Mn3 Mississippi nonchilled  675   28.1
## 63   Mn3 Mississippi nonchilled 1000   27.8
## ...
## 70   Mc1 Mississippi    chilled 1000   21.9
## 71   Mc2 Mississippi    chilled   95    7.7
## 72   Mc2 Mississippi    chilled  175   11.4
## ...
## 83   Mc3 Mississippi    chilled  675   18.9
## 84   Mc3 Mississippi    chilled 1000   19.9
A=aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"]#You can take either con or uptake 
transform(CO2,ID=rep(1:length(A),A))
   Plant        Type  Treatment conc uptake ID
1    Qn1      Quebec nonchilled   95   16.0  1
2    Qn1      Quebec nonchilled  175   30.4  1
      :
8    Qn2      Quebec nonchilled   95   13.6  2
9    Qn2      Quebec nonchilled  175   27.3  2
      :
15   Qn3      Quebec nonchilled   95   16.2  3
16   Qn3      Quebec nonchilled  175   32.4  3
一行格式:

transform(CO2,fac=rep(d<-aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"],d))

transform(CO2,fac=rep)(在我的例子中,我没有像
CO2
这样简单的名字,但我的输入是由一些链式操作产生的。在这种情况下,你可以使用点:
df_with_groupID%%…%%>%%…%%>%mutate(groupID=groupu by_if(,is.factor)%%>%groupu index())
这很好理解,
数据。table
确实可以非常简洁地表达这一点!为了读者的理智,我将坚持使用dplyr进行此分析,但它在将来肯定会有用!