在dataframe中按多个组标记唯一值
我在R中有一个很大的数据框,其中用户的任务是描述场景中的对象。我需要每个场景有3个用户,但是有些场景被描述了3次以上。我试图保留前3个唯一用户,并删除其余用户 玩具数据(真实数据集有更多的行和列)在dataframe中按多个组标记唯一值,r,dataframe,data-wrangling,R,Dataframe,Data Wrangling,我在R中有一个很大的数据框,其中用户的任务是描述场景中的对象。我需要每个场景有3个用户,但是有些场景被描述了3次以上。我试图保留前3个唯一用户,并删除其余用户 玩具数据(真实数据集有更多的行和列) 这很有帮助,但只按一列进行分组,因此我无法在这里应用它:对于每个用户,您可以使用match创建一个count变量,然后筛选出值,直到count这在data.table中是什么样子? user <- c("A", "A", "A", &q
这很有帮助,但只按一列进行分组,因此我无法在这里应用它:对于每个
用户
,您可以使用match
创建一个count
变量,然后筛选出值,直到count这在data.table中是什么样子?
user <- c("A", "A", "A", "B", "B", "C", "C", "D", "E", "E", "F", "F", "F")
scene <- c("library", "library", "library", "park", "park", "library", "library", "park", "library", "library", "library", "library", "library")
object <- c("book", "book", "lamp", "dog", "cat", "book", "lamp", "dog", "desk", "desk", "book", "lamp", "lamp")
index <- c(1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2)
dat <- data.frame(user, scene, object, index)
user scene object index
A library book 1
A library book 2
A library lamp 1
B park dog 1
B park cat 1
C library book 1
C library lamp 1
D park dog 1
E library desk 1
E library desk 2
F library book 1
F library lamp 1
F library lamp 2
... ... ... ...
user scene object index count
A library book 1 1
A library book 2 1
A library lamp 1 1
B park dog 1 1
B park cat 1 1
C library book 1 2
C library lamp 1 2
D park dog 1 2
E library desk 1 3
E library desk 2 3
library(dplyr)
dat %>%
group_by(scene) %>%
mutate(count = match(user, unique(user))) %>%
filter(count <= 3)
# user scene object index count
# <chr> <chr> <chr> <dbl> <int>
# 1 A library book 1 1
# 2 A library book 2 1
# 3 A library lamp 1 1
# 4 B park dog 1 1
# 5 B park cat 1 1
# 6 C library book 1 2
# 7 C library lamp 1 2
# 8 D park dog 1 2
# 9 E library desk 1 3
#10 E library desk 2 3
library(data.table)
setDT(dat)[, count := match(user, unique(user)), scene]
dat[count <= 3]
dat$count <- with(dat, ave(user, scene, FUN = function(x) match(x, unique(x))))
subset(dat, count <= 3)