R 有没有办法检查两个数据帧中的某些重复行是否相同?

R 有没有办法检查两个数据帧中的某些重复行是否相同?,r,dataframe,R,Dataframe,df1是df2的子集,我想检查df1中重复行的id号在df2中是否相同?因此,我想从较大的数据帧df1中创建两个新的数据帧,在其中一个数据帧中,保留重复行数相同的行,或者在另一个数据集中保留重复行数相同的行 例如: SAMPN PERNO loop 1 1 1 1 1 1 1 1

df1是df2的子集,我想检查df1中重复行的id号在df2中是否相同?因此,我想从较大的数据帧df1中创建两个新的数据帧,在其中一个数据帧中,保留重复行数相同的行,或者在另一个数据集中保留重复行数相同的行

例如:

              SAMPN    PERNO       loop
                1        1          1
                1        1          1
                1        1          2
                1        2          2
                1        3          2
                2        1          1
                2        1          1
                2        2          2
                2        3          4


              SAMPN    PERNO       loop
                1        1          1
                1        1          1
                1        1          2
                1        2          2
                1        3          2
                1        3          2
                2        1          1
                2        1          1
                2        2          2
                2        2          2
                2        3          4
                2        3          4
                2        4          1
发出

来自df2的数据在2数据集中具有相同数量的重复行:

              SAMPN    PERNO       loop
                1        1          1
                1        1          1
                1        1          2
                1        2          2
                2        1          1
                2        1          1
              SAMPN    PERNO       loop

                1        3          2
                1        3          2
                2        2          2
                2        2          2
                2        3          4
                2        3          4
                2        4          1
来自df2的数据在2数据集中的重复行数不相同:

              SAMPN    PERNO       loop
                1        1          1
                1        1          1
                1        1          2
                1        2          2
                2        1          1
                2        1          1
              SAMPN    PERNO       loop

                1        3          2
                1        3          2
                2        2          2
                2        2          2
                2        3          4
                2        3          4
                2        4          1
要检查的数据

structure(list(SAMPN = c(50, 50, 50, 50, 50, 50, 51, 53, 53, 
53, 53, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54), PERNO = c(4, 
4, 5, 5, 6, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 5, 5, 5
), PLANO = c(4, 5, 2, 3, 2, 3, 3, 2, 3, 4, 5, 2, 3, 4, 5, 6, 
7, 2, 3, 2, 3, 4), loop = c(3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 3, 3, 3, 2, 2, 2, 2, 3), TPURP = structure(c(16L, 2L, 
5L, 2L, 5L, 2L, 2L, 18L, 18L, 13L, 2L, 8L, 3L, 2L, 20L, 13L, 
2L, 5L, 2L, 5L, 2L, 3L), .Label = c("(1) Working at home (for pay)", 
"(2) All other home activities", "(3) Work/Job", "(4) All other activities at work", 
"(5) Attending class", "(6) All other activities at school", 
"(7) Change type of transportation/transfer", "(8) Dropped off passenger", 
"(9) Picked up passenger", "(10) Other, specify - transportation", 
"(11) Work/Business related", "(12) Service Private Vehicle", 
"(13) Routine Shopping", "(14) Shopping for major purchases", 
"(15) Household errands", "(16) Personal Business", "(17) Eat meal outside of home", 
"(18) Health care", "(19) Civic/Religious activities", "(20) Recreation/Entertainment", 
"(21) Visit friends/relative", "(24) Loop trip", "(97) Other, specify"
), class = "factor")), row.names = 431:452, class = "data.frame")


structure(list(SAMPN = c(48, 50, 50, 50, 50, 50, 56, 56, 58, 
58, 58, 58, 58, 58, 58, 58), PERNO = c(7, 1, 1, 2, 3, 6, 1, 3, 
1, 1, 1, 1, 2, 2, 2, 2), PLANO = c(3, 2, 4, 2, 4, 2, 6, 3, 2, 
3, 4, 5, 2, 3, 4, 5), loop = c(2, 2, 3, 2, 3, 2, 3, 2, 2, 2, 
2, 2, 2, 2, 2, 2), TPURP = structure(c(2L, 8L, 22L, 8L, 22L, 
5L, 2L, 2L, 18L, 17L, 13L, 2L, 16L, 17L, 13L, 2L), .Label = c("(1) Working at home (for pay)", 
"(2) All other home activities", "(3) Work/Job", "(4) All other activities at work", 
"(5) Attending class", "(6) All other activities at school", 
"(7) Change type of transportation/transfer", "(8) Dropped off passenger", 
"(9) Picked up passenger", "(10) Other, specify - transportation", 
"(11) Work/Business related", "(12) Service Private Vehicle", 
"(13) Routine Shopping", "(14) Shopping for major purchases", 
"(15) Household errands", "(16) Personal Business", "(17) Eat meal outside of home", 
"(18) Health care", "(19) Civic/Religious activities", "(20) Recreation/Entertainment", 
"(21) Visit friends/relative", "(24) Loop trip", "(97) Other, specify"
), class = "factor")), row.names = c(412L, 420L, 422L, 423L, 
428L, 435L, 467L, 474L, 480L, 481L, 482L, 483L, 484L, 485L, 486L, 
487L), class = "data.frame")

也许有一种更简单的方法,但这里有一种方法使用
dplyr
。我们首先对两个数据帧中每组的行数进行
计数,然后进行
左键联接

library(dplyr)

df3 <- left_join(df2 %>% count(SAMPN, PERNO, loop), 
                 df1 %>% count(SAMPN, PERNO, loop), by = c("SAMPN", "PERNO","loop"))
还有一个计数不匹配的

df3 %>%
  filter(n.x != n.y | is.na(n.y)) %>%
  select(names(df2)) %>%
  inner_join(df2)

#  SAMPN PERNO  loop
#  <int> <int> <int>
#1     1     3     2
#2     1     3     2
#3     2     2     2
#4     2     2     2
#5     2     3     4
#6     2     3     4
#7     2     4     1
df3%>%
过滤器(n.x!=n.y |是.na(n.y))%>%
选择(名称(df2))%>%
内螺纹联接(df2)
#SAMPN PERNO环
#    
#1     1     3     2
#2     1     3     2
#3     2     2     2
#4     2     2     2
#5     2     3     4
#6     2     3     4
#7     2     4     1