R-按变量分组,然后分配唯一ID
我感兴趣的是去识别同时具有时间固定值和时变值的敏感数据集。我想(a)根据社会保险号对所有案例进行分组,(b)为这些案例分配唯一的ID,然后(c)删除社会保险号 下面是一个示例数据集:R-按变量分组,然后分配唯一ID,r,dplyr,R,Dplyr,我感兴趣的是去识别同时具有时间固定值和时变值的敏感数据集。我想(a)根据社会保险号对所有案例进行分组,(b)为这些案例分配唯一的ID,然后(c)删除社会保险号 下面是一个示例数据集: personal_id gender temperature 111-11-1111 M 99.6 999-999-999 F 98.2 111-11-1111 M 97.8 999-999-999 F 98.3
personal_id gender temperature
111-11-1111 M 99.6
999-999-999 F 98.2
111-11-1111 M 97.8
999-999-999 F 98.3
888-88-8888 F 99.0
111-11-1111 M 98.9
非常感谢任何解决方案。使用dplyr软件包:
library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))
然后将案例数据框与您的数据连接起来:
data <- left_join(data, cases, by = c("personal_id" = "levels"))
最后删除个人id和简单id:
select(-personal_id, -id)
你看:):
dplyr
具有创建唯一组ID的组索引
功能
library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))
data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)
data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1
dplyr::group_index()
从dplyr 1.0.0
开始就不推荐使用dplyr::cur\u group\u id()
应改为:
df %>%
group_by(personal_id) %>%
mutate(group_id = cur_group_id())
personal_id gender temperature group_id
<chr> <chr> <dbl> <int>
1 111-11-1111 M 99.6 1
2 999-999-999 F 98.2 3
3 111-11-1111 M 97.8 1
4 999-999-999 F 98.3 3
5 888-88-8888 F 99 2
6 111-11-1111 M 98.9 1
df%>%
分组人(个人id)%>%
变异(组id=cur\u组id())
个人识别号性别温度组识别号
1111-11-1111米99.6 1
2999-999-999 F 98.2 3
3111-11-1111米97.8 1
4 999-999-999 F 98.3 3
5888-88-8888 F 99 2
6111-11-1111米98.9 1
也许是一个懒惰的解决方案,但我想你可以把社会保险号码散列出来。一种方法是set.seed(1234);级别(个人\u id)不幸的是,group\u index()
似乎会在创建组id之前自动对个人\u id进行排序,这并不总是需要的。group\u index()
在dplyr 1.0.0中被弃用。请立即使用cur\u group\u id()
。这应该是新的公认答案!
mutate(UID = paste(id, gender, sep=""))
select(-personal_id, -id)
data <- left_join(data, cases, by = c("personal_id" = "levels")) %>%
mutate(UID = paste(id, gender, sep="")) %>%
select(-personal_id, -id)
gender temperature UID
1 M 99.6 1M
2 F 98.2 3F
3 M 97.8 2M
4 M 95.5 1M
library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))
data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)
data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1
data %>%
mutate(group_id = group_indices(., personal_id))
df %>%
group_by(personal_id) %>%
mutate(group_id = cur_group_id())
personal_id gender temperature group_id
<chr> <chr> <dbl> <int>
1 111-11-1111 M 99.6 1
2 999-999-999 F 98.2 3
3 111-11-1111 M 97.8 1
4 999-999-999 F 98.3 3
5 888-88-8888 F 99 2
6 111-11-1111 M 98.9 1