在R中按另一列分组时,是否可以列出一列的唯一值?
我有以下专栏:在R中按另一列分组时,是否可以列出一列的唯一值?,r,R,我有以下专栏: session condition codes 15 anxiety 1 15 depression 1 15 bipolar 1 15 high blood pressure 3 15 panic attacks 1 66 hy
session condition codes
15 anxiety 1
15 depression 1
15 bipolar 1
15 high blood pressure 3
15 panic attacks 1
66 hypertension 5
66 high blood pressure 3
66 anxiety 1
66 panic attacks 1
75 schizophrenia 1
32 muscular dystrophy 4
32 anxiety 1
32 depression 1
32 panic attacks 1
我想创建一个新列,其中只包含每个会话的唯一代码,然后将该会话的其余行留空。我知道这在逻辑上是没有意义的,因为第三列与第一列并不匹配。如果它需要在一个新的对象或列表中,或者其他合适的东西中
session condition codes unique_codes
15 anxiety 1 1
15 depression 1 3
15 bipolar 1
15 high blood pressure 3
15 panic attacks 1
66 hypertension 5 5
66 high blood pressure 3 3
66 anxiety 1 1
66 panic attacks 1
75 schizophrenia 1 1
32 muscular dystrophy 4 4
32 anxiety 1 1
32 depression 1
32 panic attacks 1
我试过:
conditions=conditions %>%
group_by(session)%>%
mutate(unique_codes=unique(conditions$codes))
然而,我得到一个错误,说“必须是长度5(组大小)或1,而不是4”,我认为这是因为我希望其余的行为空。有人知道怎么解决这个问题吗?谢谢 长度是个问题,我们可以将其粘贴在一起,也可以创建一个列表列
library(dplyr)
conditions %>%
group_by(session)%>%
mutate(unique_codes = toString(unique(codes)))
或者另一个选项是通过在末尾填充NA
来设置相同的长度
conditions %>%
group_by(session) %>%
mutate(unique_codes = `length<-`(unique(codes), n()))
# A tibble: 14 x 4
# Groups: session [4]
# session condition codes unique_codes
# <int> <chr> <int> <int>
# 1 15 anxiety 1 1
# 2 15 depression 1 3
# 3 15 bipolar 1 NA
# 4 15 high blood pressure 3 NA
# 5 15 panic attacks 1 NA
# 6 66 hypertension 5 5
# 7 66 high blood pressure 3 3
# 8 66 anxiety 1 1
# 9 66 panic attacks 1 NA
#10 75 schizophrenia 1 1
#11 32 muscular dystrophy 4 4
#12 32 anxiety 1 1
#13 32 depression 1 NA
#14 32 panic attacks 1 NA
数据
条件另一个dplyr
选项可以是:
df %>%
group_by(session) %>%
distinct(codes) %>%
transmute(unique_codes = codes,
rowid = 1:n()) %>%
right_join(df %>%
group_by(session) %>%
mutate(rowid = 1:n())) %>%
ungroup() %>%
select(-rowid)
session unique_codes condition codes
<int> <int> <chr> <int>
1 15 1 anxiety 1
2 15 3 depression 1
3 15 NA bipolar 1
4 15 NA high blood pressure 3
5 15 NA panic attacks 1
6 66 5 hypertension 5
7 66 3 high blood pressure 3
8 66 1 anxiety 1
9 66 NA panic attacks 1
10 75 1 schizophrenia 1
11 32 4 muscular dystrophy 4
12 32 1 anxiety 1
13 32 NA depression 1
14 32 NA panic attacks 1
df%>%
分组人(会话)%>%
不同(代码)%%>%
转换(唯一代码=代码,
rowid=1:n())%>%
右联合(df%>%
分组人(会话)%>%
突变(rowid=1:n())%>%
解组()%>%
选择(-rowid)
会话唯一\u代码条件代码
1 15 1焦虑1
2153凹陷1
3 15 NA双极1
4 15 NA高血压3
5 15 NA惊恐发作1
665高血压5
7 66 3高血压3
8 66 1焦虑1
9 66 NA恐慌症发作1
10751精神分裂症1
11 32 4肌营养不良症4
12 32 1焦虑1
13 32 NA洼地1
14 32 NA恐慌症发作1
代码有任何问题当我尝试第二个选项时,它说“n()中的错误:缺少参数“y”,没有默认值“@alex你能显示你的包版本('dplyr')
。如果是版本问题,则将其更改为length(codes)
而不是n()
@alex您可以使用length(codes)
检查它是否正常,因为它是base R
函数如果我在“code”列中缺少值,是否有办法不将“NA”作为该会话号的唯一值计算?现在,每当会话有没有代码的条件时,它都会在unique codes列中添加一个“NA”。谢谢你的帮助。
conditions <- structure(list(session = c(15L, 15L, 15L, 15L, 15L, 66L, 66L,
66L, 66L, 75L, 32L, 32L, 32L, 32L), condition = c("anxiety",
"depression", "bipolar", "high blood pressure", "panic attacks",
"hypertension", "high blood pressure", "anxiety", "panic attacks",
"schizophrenia", "muscular dystrophy", "anxiety", "depression",
"panic attacks"), codes = c(1L, 1L, 1L, 3L, 1L, 5L, 3L, 1L, 1L,
1L, 4L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-14L))
df %>%
group_by(session) %>%
distinct(codes) %>%
transmute(unique_codes = codes,
rowid = 1:n()) %>%
right_join(df %>%
group_by(session) %>%
mutate(rowid = 1:n())) %>%
ungroup() %>%
select(-rowid)
session unique_codes condition codes
<int> <int> <chr> <int>
1 15 1 anxiety 1
2 15 3 depression 1
3 15 NA bipolar 1
4 15 NA high blood pressure 3
5 15 NA panic attacks 1
6 66 5 hypertension 5
7 66 3 high blood pressure 3
8 66 1 anxiety 1
9 66 NA panic attacks 1
10 75 1 schizophrenia 1
11 32 4 muscular dystrophy 4
12 32 1 anxiety 1
13 32 NA depression 1
14 32 NA panic attacks 1