改进dplyr解决方案--根据其他信息通过条件排序（位置）创建变量_R_Tidyverse_Mutate

改进dplyr解决方案--根据其他信息通过条件排序（位置）创建变量

改进dplyr解决方案--根据其他信息通过条件排序（位置）创建变量,r,tidyverse,mutate,R,Tidyverse,Mutate,我正在研究一个数据集，其中每个参与者（ID）都被评估了1、2或3次。这是一项纵向研究。不幸的是，当第一位分析师对数据集进行编码时，她/他没有分配任何相关信息因为所有参与者都有年龄信息（以月为单位），所以很容易确定第一次评估是什么时候，第二次评估是什么时候，依此类推。在第一次评估中，参与者比第二次年轻，依此类推。我用tidyverse工具来处理这个问题，一切都正常。然而，我真的知道（想象一下…）还有很多其他（更多）优雅的解决方案，我来到这个论坛就是为了这个。有人能给我一些关于如何使这段代码更

我正在研究一个数据集，其中每个参与者（ID）都被评估了1、2或3次。这是一项纵向研究。不幸的是，当第一位分析师对数据集进行编码时，她/他没有分配任何相关信息

因为所有参与者都有年龄信息（以月为单位），所以很容易确定第一次评估是什么时候，第二次评估是什么时候，依此类推。在第一次评估中，参与者比第二次年轻，依此类推。

我用tidyverse工具来处理这个问题，一切都正常。然而，我真的知道（想象一下…）还有很多其他（更多）优雅的解决方案，我来到这个论坛就是为了这个。有人能给我一些关于如何使这段代码更简短和清晰的想法吗

这是复制代码的假数据：

ds <- data.frame(id = seq(1:6),
                 months = round(rnorm(18, mean=12, sd=2),0),
                 x1 = sample(0:2), 
                 x2 = sample(0:2),
                 x3 = sample(0:2),
                 x4 = sample(0:2))

#add how many times each child was acessed
ds <- ds %>% group_by(id) %>% mutate(how_many = n())
#Add position
ds %>% group_by(id) %>% 
  mutate(first = min(months), 
         max = max(months), 
         med = median(months)) -> ds

#add label to the third evaluation (the second will be missing)
ds %>% 
  mutate(group = case_when((how_many == 3) & (months %in% first) ~ "First evaluation",
                           (how_many == 3) & (months %in% max) ~ "Third evaluation",
                           TRUE ~ group)) -> ds
#add label to the second evaluation for all children evaluated two times 
ds %>% mutate_at(vars(group), funs(if_else(is.na(.),"Second Evaluation",.))) -> ds

ds%变异（多少=n（）
#添加位置
ds%%>%分组依据（id）%%>%
变异（第一次=分钟（月），
最大值=最大值（月），
med=中位数（月数））->ds
#将标签添加到第三次评估（第二次评估将丢失）
ds%>%
当（（数量==3）和（月份百分比在%first）~“第一次评估”时，改变（组=案例），
（数量==3）和（最大百分比中的月数）~“第三次评估”，
真~群）->ds
#将标签添加到第二次评估的所有子项（评估两次）
ds%>%突变在（变量（组），funs（如果其他（is.na（.），“第二次评估”）->ds

这是我的原始代码：

temp <- dataset %>% select(idind, arm, infant_sex,infant_age_months)
#add how many times each child was acessed
temp <- temp %>% group_by(idind) %>% mutate(how_many = n())
#Add position
temp %>% group_by(idind) %>% 
  mutate(first = min(infant_age_months), 
         max = max(infant_age_months), 
         med = median(infant_age_months)) -> temp

#add label to the first evaluation
temp %>% 
  mutate(group = case_when(how_many == 1 ~ "First evaluation")) -> temp

#add label to the second evaluation (and keep all previous results)
temp %>% 
  mutate(group = case_when((how_many == 2) & (infant_age_months %in% first) ~ "First evaluation",
                           (how_many == 2) & (infant_age_months %in% max) ~ "Second evaluation",
                           TRUE ~ group)) -> temp

#add label to the third evaluation (the second will be missing)
temp %>% 
  mutate(group = case_when((how_many == 3) & (infant_age_months %in% first) ~ "First evaluation",
                           (how_many == 3) & (infant_age_months %in% max) ~ "Third evaluation",
                           TRUE ~ group)) -> temp
#add label to the second evaluation for all children evaluated two times 
temp %>% mutate_at(vars(group), funs(if_else(is.na(.),"Second Evaluation",.))) -> temp

temp%选择（ID、手臂、婴儿性别、婴儿年龄、月数）
#加上每个孩子的得分次数
临时%group\U by（IDID）%%>%变异（多少=n（））
#添加位置
临时%>%分组依据（IDID）%>%
突变（第一次=分钟（婴儿年龄月），
max=max（婴儿年龄月数），
med=中值（婴儿年龄月数）->温度
#将标签添加到第一个评估
温度%>%
变异（组=案例（当（数量=1~“第一次评估”）->temp
#将标签添加到第二次评估（并保留所有以前的结果）
温度%>%
变异（组=病例）【当（（数量==2）和（婴儿年龄【】月数%（以%为单位）~“首次评估”】，
（数量==2）和（婴儿年龄月数百分比，以最大百分比表示）~“第二次评估”，
真~group））->temp
#将标签添加到第三次评估（第二次评估将丢失）
温度%>%
变异（组=病例）【当（（数量==3）和（婴儿年龄【】月数%（以%为单位）~“首次评估”】，
（多少==3）和（婴儿年龄月数百分比，最大百分比）~“第三次评估”，
真~group））->temp
#将标签添加到第二次评估的所有子项（评估两次）
temp%>%突变在（变量（组），funs（如果其他（is.na（.），“第二次评估”）->temp

请记住，在问这个问题之前我用过搜索框，我真的想象其他人在编程时也能想出同样的问题。

非常感谢

好了。我使用

rank（）

给出治疗顺序

ds <- data.frame(id = seq(1:6),
             months = round(rnorm(18, mean=12, sd=2),0),
             x1 = sample(0:2), 
             x2 = sample(0:2),
             x3 = sample(0:2),
             x4 = sample(0:2))

ds2 = ds %>% group_by(id) %>% mutate(rank = rank(months,ties.method="first"))
labels = c("First", "Second","Third")
ds2$labels = labels[ds2$rank]

ds%group\u by（id）%%>%mutate（rank=rank（month，ties.method=“first”））
标签=c（“第一”、“第二”、“第三”）
ds2$labels=标签[ds2$rank]

好了。我使用

rank（）

给出治疗顺序

ds <- data.frame(id = seq(1:6),
             months = round(rnorm(18, mean=12, sd=2),0),
             x1 = sample(0:2), 
             x2 = sample(0:2),
             x3 = sample(0:2),
             x4 = sample(0:2))

ds2 = ds %>% group_by(id) %>% mutate(rank = rank(months,ties.method="first"))
labels = c("First", "Second","Third")
ds2$labels = labels[ds2$rank]

ds%group\u by（id）%%>%mutate（rank=rank（month，ties.method=“first”））
标签=c（“第一”、“第二”、“第三”）
ds2$labels=标签[ds2$rank]

或仅按年龄排列，并使用

1:n（）

而不是

n（）

，这将创建一个序列：

ds <- ds %>% group_by(id) %>% arrange(months) %>% mutate(how_many = 1:n())
ds %>% arrange(id, months)

# A tibble: 18 x 7
# Groups:   id [6]
      id months    x1    x2    x3    x4 how_many
   <int>  <dbl> <int> <int> <int> <int>    <int>
 1     1     10     1     2     0     1        1
 2     1     11     1     2     0     1        2
 3     1     12     1     2     0     1        3
 4     2     11     0     1     2     2        1
 5     2     14     0     1     2     2        2
 6     2     14     0     1     2     2        3

或者只需按年龄排列，并使用

1:n（）

而不是

n（）

，这将创建一个序列：

ds <- ds %>% group_by(id) %>% arrange(months) %>% mutate(how_many = 1:n())
ds %>% arrange(id, months)

# A tibble: 18 x 7
# Groups:   id [6]
      id months    x1    x2    x3    x4 how_many
   <int>  <dbl> <int> <int> <int> <int>    <int>
 1     1     10     1     2     0     1        1
 2     1     11     1     2     0     1        2
 3     1     12     1     2     0     1        3
 4     2     11     0     1     2     2        1
 5     2     14     0     1     2     2        2
 6     2     14     0     1     2     2        3

难以置信的多谢！难以置信的多谢！