使用dplyr创建共识列
我有一个数据帧:使用dplyr创建共识列,r,dplyr,R,Dplyr,我有一个数据帧: Groups Name Category value G1 A cat1 20 G1 A cat2 1 G1 B cat3 21 G1 B cat3 23 G2 B cat4 32 G2 C cat2 23 G2 C cat2 21 我想添加一个新的专栏cons
Groups Name Category value
G1 A cat1 20
G1 A cat2 1
G1 B cat3 21
G1 B cat3 23
G2 B cat4 32
G2 C cat2 23
G2 C cat2 21
我想添加一个新的专栏consenses\u category
,例如:
Groups Name Category value consensus_category
G1 A cat1 20 cat2
G1 A cat2 1 cat2
G1 B cat3 21 cat2
G1 B cat3 23 cat2
G2 A cat4 32 cat4
G2 C cat2 23 cat4
G2 C cat2 21 cat4
其思想是,我有一个向量=c(“a”),对应于数据帧中的一个特定名称
根据这个名字,我想为同一组中的所有其他行
编写相应的类别
,但如果两个类别
之间存在exaequo
,那么胜利者将使用最低的值
。(如:
G1 A cat1 20 cat2
G1 A cat2 1 cat2
cat2
获胜是因为1<20
我试过:
df %>%
group_by(Groups) %>%
add_count(Category) %>%
top_n(1, n) %>%
top_n(-1, Value) %>%
distinct(consensus_category = Category) %>%
right_join(df)
但是我不知道如何指定我想要作为共识指南的向量
(A)中的值
使用dplyr
可以在组中找到具有vec
的名称
,获取最小值
,并从中提取相应的类别
。这是假设每个组
中至少有一个vec
值
library(dplyr)
vec <- "A"
df %>%
group_by(Groups) %>%
mutate(consensus_category = Category[value == min(value[Name == vec])])
# Groups Name Category value consensus_category
# <fct> <fct> <fct> <int> <fct>
#1 G1 A cat1 20 cat2
#2 G1 A cat2 1 cat2
#3 G1 B cat3 21 cat2
#4 G1 B cat3 23 cat2
#5 G2 A cat4 32 cat4
#6 G2 C cat2 23 cat4
#7 G2 C cat2 21 cat4
带有
data.table的选项
library(data.table)
setDT(df)[, consensus_category := Category[value ==
min(value[Name == vec])], Groups]
df
# Groups Name Category value consensus_category
#1: G1 A cat1 20 cat2
#2: G1 A cat2 1 cat2
#3: G1 B cat3 21 cat2
#4: G1 B cat3 23 cat2
#5: G2 A cat4 32 cat4
#6: G2 C cat2 23 cat4
#7: G2 C cat2 21 cat4
数据
df
library(data.table)
setDT(df)[, consensus_category := Category[value ==
min(value[Name == vec])], Groups]
df
# Groups Name Category value consensus_category
#1: G1 A cat1 20 cat2
#2: G1 A cat2 1 cat2
#3: G1 B cat3 21 cat2
#4: G1 B cat3 23 cat2
#5: G2 A cat4 32 cat4
#6: G2 C cat2 23 cat4
#7: G2 C cat2 21 cat4
df <- structure(list(Groups = c("G1", "G1", "G1", "G1", "G2", "G2",
"G2"), Name = c("A", "A", "B", "B", "A", "C", "C"), Category =
c("cat1", "cat2", "cat3", "cat3", "cat4", "cat2", "cat2"), value =
c(20L, 1L, 21L, 23L, 32L, 23L, 21L)), class = "data.frame", row.names =
c(NA, -7L))