基于r中的多个列仅选择重复项_R_Duplicates

基于r中的多个列仅选择重复项

基于r中的多个列仅选择重复项,r,duplicates,R,Duplicates,我有一个以下格式的数据帧： mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4 21.4 6 258.0

我有一个以下格式的数据帧：

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4

我想选择所有重复的条件，他们是在mpg和carb中重复

这将导致以下情况：

        mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4

dplyr

解决方案：

library(dplyr)

mtcars %>% 
  add_count(mpg, carb) %>% # count how many times the combinations of those variables exist and add those counts in a new column
  filter(n > 1) %>%        # keep only rows where the combination appears multiple times
  select(-n)               # remove counts

# # A tibble: 6 x 11
#    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1  21       6 160     110  3.9   2.62  16.5     0     1     4     4
# 2  21       6 160     110  3.9   2.88  17.0     0     1     4     4
# 3  10.4     8 472     205  2.93  5.25  18.0     0     0     3     4
# 4  10.4     8 460     215  3     5.42  17.8     0     0     3     4
# 5  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
# 6  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2

库（dplyr）
mtcars%>%
添加_count（mpg，carb）%>%#计算这些变量组合存在的次数，并将这些计数添加到新列中
筛选（n>1）%>%#仅保留组合多次出现的行
选择（-n）#删除计数
##tibble:6 x 11
#mpg气缸显示hp drat wt qsec与am齿轮carb
#              
# 1  21       6 160     110  3.9   2.62  16.5     0     1     4     4
# 2  21       6 160     110  3.9   2.88  17.0     0     1     4     4
# 3  10.4     8 472     205  2.93  5.25  18.0     0     0     3     4
# 4  10.4     8 460     215  3     5.42  17.8     0     0     3     4
# 5  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
# 6  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2

这里是另一个

dplyr

选项：

library(dplyr)

mtcars %>% 
  group_by(mpg, carb) %>% 
  filter(n()>1)

使用

data.table

，我们可以

library(data.table)
as.data.table(mtcars)[, .SD[.N > 1], .(mpg, carb)]

mtcars[duplicated（mtcars[，c（“mpg”，“carb”）]）重复（mtcars[，c（“mpg”，“carb”）]，fromLast=TRUE），]

或

mtcars[ave（seq_len（nrow（mtcars）），mtcars$mpg，mtcars$carb，FUN=length）>1，]