R 如何按除部分变量外的所有变量分组,并为每个观察添加组id
我有这样一个数据集:R 如何按除部分变量外的所有变量分组,并为每个观察添加组id,r,dplyr,R,Dplyr,我有这样一个数据集: data(CO2, package = 'datasets') ## Plant Type Treatment conc uptake ## 1 Qn1 Quebec nonchilled 95 16.0 ## 2 Qn1 Quebec nonchilled 175 30.4 ## ... ## 17 Qn3 Quebec nonchilled 250 40.3 ## 18 Qn
data(CO2, package = 'datasets')
## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## ...
## 17 Qn3 Quebec nonchilled 250 40.3
## 18 Qn3 Quebec nonchilled 350 42.1
## ...
## 27 Qc1 Quebec chilled 675 35.4
## 28 Qc1 Quebec chilled 1000 38.7
## ...
## 36 Qc3 Quebec chilled 95 15.1
## 37 Qc3 Quebec chilled 175 21.0
## ...
## 47 Mn1 Mississippi nonchilled 500 30.9
## ...
## 53 Mn2 Mississippi nonchilled 350 31.8
## 54 Mn2 Mississippi nonchilled 500 32.4
## ...
## 62 Mn3 Mississippi nonchilled 675 28.1
## 63 Mn3 Mississippi nonchilled 1000 27.8
## ...
## 70 Mc1 Mississippi chilled 1000 21.9
## 71 Mc2 Mississippi chilled 95 7.7
## 72 Mc2 Mississippi chilled 175 11.4
## ...
## 83 Mc3 Mississippi chilled 675 18.9
## 84 Mc3 Mississippi chilled 1000 19.9
A=aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"]#You can take either con or uptake
transform(CO2,ID=rep(1:length(A),A))
Plant Type Treatment conc uptake ID
1 Qn1 Quebec nonchilled 95 16.0 1
2 Qn1 Quebec nonchilled 175 30.4 1
:
8 Qn2 Quebec nonchilled 95 13.6 2
9 Qn2 Quebec nonchilled 175 27.3 2
:
15 Qn3 Quebec nonchilled 95 16.2 3
16 Qn3 Quebec nonchilled 175 32.4 3
- 观察结果应根据组合进行分组
除
和吸收之外的所有变量。所以我想指定 我不想用于分组的变量conc
- 我想向数据集中添加一个新变量
,其中包含所有观测值 属于同一组的具有相同值的GroupID
GroupID
library(dplyr)
CO2 %>%
mutate(GroupID=
do.call( group_indices
, c( list(.data=.)
, colnames(.) %>%
setdiff(c('conc','uptake')) %>%
as.name()
)
)
)
## Plant Type Treatment conc uptake GroupID
## 1 Qn1 Quebec nonchilled 95 16.0 1
## 2 Qn1 Quebec nonchilled 175 30.4 1
## ...
## 8 Qn2 Quebec nonchilled 95 13.6 2
## 9 Qn2 Quebec nonchilled 175 27.3 2
## ...
## 15 Qn3 Quebec nonchilled 95 16.2 3
## 16 Qn3 Quebec nonchilled 175 32.4 3
## ...
## 22 Qc1 Quebec chilled 95 14.2 4
## 23 Qc1 Quebec chilled 175 24.1 4
## ...
## 29 Qc2 Quebec chilled 95 9.3 6
## 30 Qc2 Quebec chilled 175 27.3 6
## ...
## 36 Qc3 Quebec chilled 95 15.1 5
## 37 Qc3 Quebec chilled 175 21.0 5
## ...
## 43 Mn1 Mississippi nonchilled 95 10.6 9
## 44 Mn1 Mississippi nonchilled 175 19.2 9
## ...
有更简单的解决办法吗
额外好处:如果有一个解决方案可以使用相同类型的所有变量(例如所有因子变量)进行分组,那将是一个大爆炸。可以使用
数据中的.GRP
表
library(data.table)
setDT(CO2)[, GroupID := .GRP, setdiff(names(CO2), c('conc','uptake'))]
如果
根据条件对变量进行分组,我们可以使用逐个分组。在这种情况下,是。factor
用于评估列是否为因子。之后,group\u index
可以为每个组生成ID
library(dplyr)
CO2_2 <- CO2 %>%
mutate(GroupID = CO2 %>%
group_by_if(is.factor) %>%
group_indices())
head(CO2_2)
# Plant Type Treatment conc uptake GroupID
# 1 Qn1 Quebec nonchilled 95 16.0 1
# 2 Qn1 Quebec nonchilled 175 30.4 1
# 3 Qn1 Quebec nonchilled 250 34.8 1
# 4 Qn1 Quebec nonchilled 350 37.2 1
# 5 Qn1 Quebec nonchilled 500 35.3 1
# 6 Qn1 Quebec nonchilled 675 39.2 1
使用base r,我们可以执行以下操作:
data(CO2, package = 'datasets')
## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## ...
## 17 Qn3 Quebec nonchilled 250 40.3
## 18 Qn3 Quebec nonchilled 350 42.1
## ...
## 27 Qc1 Quebec chilled 675 35.4
## 28 Qc1 Quebec chilled 1000 38.7
## ...
## 36 Qc3 Quebec chilled 95 15.1
## 37 Qc3 Quebec chilled 175 21.0
## ...
## 47 Mn1 Mississippi nonchilled 500 30.9
## ...
## 53 Mn2 Mississippi nonchilled 350 31.8
## 54 Mn2 Mississippi nonchilled 500 32.4
## ...
## 62 Mn3 Mississippi nonchilled 675 28.1
## 63 Mn3 Mississippi nonchilled 1000 27.8
## ...
## 70 Mc1 Mississippi chilled 1000 21.9
## 71 Mc2 Mississippi chilled 95 7.7
## 72 Mc2 Mississippi chilled 175 11.4
## ...
## 83 Mc3 Mississippi chilled 675 18.9
## 84 Mc3 Mississippi chilled 1000 19.9
A=aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"]#You can take either con or uptake
transform(CO2,ID=rep(1:length(A),A))
Plant Type Treatment conc uptake ID
1 Qn1 Quebec nonchilled 95 16.0 1
2 Qn1 Quebec nonchilled 175 30.4 1
:
8 Qn2 Quebec nonchilled 95 13.6 2
9 Qn2 Quebec nonchilled 175 27.3 2
:
15 Qn3 Quebec nonchilled 95 16.2 3
16 Qn3 Quebec nonchilled 175 32.4 3
一行格式:
transform(CO2,fac=rep(d<-aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"],d))
transform(CO2,fac=rep)(在我的例子中,我没有像CO2
这样简单的名字,但我的输入是由一些链式操作产生的。在这种情况下,你可以使用点:df_with_groupID%%…%%>%%…%%>%mutate(groupID=groupu by_if(,is.factor)%%>%groupu index())
这很好理解,数据。table
确实可以非常简洁地表达这一点!为了读者的理智,我将坚持使用dplyr进行此分析,但它在将来肯定会有用!