R 比较多个列并创建匹配的计数_R_Loops

R 比较多个列并创建匹配的计数

r loops

R 比较多个列并创建匹配的计数,r,loops,R,Loops,我有一个数据集，里面有受访者朋友和欺负者的身份证号码我想查看每一行的所有友谊提名和所有欺负提名，并统计他们提名的人数。任何帮助都会很好有数据： ID friend_1 friend_2 friend_3 bully_1 bully_2 1 4 12 7 12 15 2 8 6 7 18 20 3 9 18

我有一个数据集，里面有受访者朋友和欺负者的身份证号码

我想查看每一行的所有友谊提名和所有欺负提名，并统计他们提名的人数。任何帮助都会很好

有数据：

ID  friend_1  friend_2  friend_3  bully_1  bully_2
1          4        12         7       12       15
2          8         6         7       18       20
3          9        18         1        2        1
4         15         7         2        7       13 
5          1        17         9       17        1
6          9        19        20       14       12
7         19        12        20        9       12
8          7         1        16        2       15 
9          1        10        12        1        7
10         7        11         9       11        7

需要数据：

ID  friend_1  friend_2  friend_3  bully_1  bully_2  num_both
1          4        12         7       12       15         1
2          8         6         7       18       20         0
3          9        18         1        2        1         1
4         15         7         2        7       13         1
5          1        17         9       17        1         2
6          9        19        20       14       12         0
7         19        12        20        9       12         1
8          7         1        16        2       15         0
9          1        10        12        1        7         1
10         7        11         9       11        7         2

我们可以按行使用

apply

，找出

friend

和

bully

列中的普通朋友数

df$num_both <- apply(df, 1, function(x) 
      length(intersect(x[grep("friend", names(df))], x[grep("bully", names(df))])))


#   ID friend_1 friend_2 friend_3 bully_1 bully_2 num_both
#1   1        4       12        7      12      15        1
#2   2        8        6        7      18      20        0
#3   3        9       18        1       2       1        1
#4   4       15        7        2       7      13        1
#5   5        1       17        9      17       1        2
#6   6        9       19       20      14      12        0
#7   7       19       12       20       9      12        1
#8   8        7        1       16       2      15        0
#9   9        1       10       12       1       7        1
#10 10        7       11        9      11       7        2

编辑

如果存在一些

NA

值，并且我们想要排除它们，我们可以使用

is.NA

和

sum

apply(df, 1, function(x) sum(!is.na(intersect(x[friend_cols], x[bully_cols]))))

您可以尝试将每个

bully

列与friends列进行比较，然后使用union计算匹配矩阵。要获得您的

num\u二者

只需

rowSum

此匹配矩阵：

bully_cols <- grep("bully", names(df))
friend_cols <- grep("friend", names(df))
df$num_both <- rowSums(Reduce("|", lapply(df[,bully_cols], function(x, compare) compare == x, compare = df[,friend_cols])))

假设价值观在朋友/欺负者群体中是唯一的，一个简单的方法是：

apply(df[,-1], 1, function (x) sum(table(x) > 1)) 
[1] 1 0 1 1 2 0 1 0 1 2

下面是一个基于

data.table

的

melt

方法。我们

根据列名中的模式（以friend
开始，bully
）将融化为'long'格式，按'ID'分组，获得长数据集列'value1'，'value2'的元素的长度，并对'ID'进行联接
library(data.table)
setDT(df1)[melt(df1, measure = patterns('^friend', '^bully'))[,
   .(num_both = length(intersect(value1, value2))), ID], on = .(ID)]
#    ID friend_1 friend_2 friend_3 bully_1 bully_2 num_both
# 1:  1        4       12        7      12      15        1
# 2:  2        8        6        7      18      20        0
# 3:  3        9       18        1       2       1        1
# 4:  4       15        7        2       7      13        1
# 5:  5        1       17        9      17       1        2
# 6:  6        9       19       20      14      12        0
# 7:  7       19       12       20       9      12        1
# 8:  8        7        1       16       2      15        0
# 9:  9        1       10       12       1       7        1
#10: 10        7       11        9      11       7        2


或者使用tidyverse
bycollect
ing进入“long”格式，按“ID”分组，summary
使用length
的intersect
根据“key”列中出现的“friend”或“bully”对“value”元素进行排序，并将其与原始数据集进行右键联接

library(tidyverse)
df1 %>% 
   gather(key, value, -ID) %>% 
   group_by(ID) %>% 
   summarise(num_both = length(intersect(value[str_detect(key, 'friend')], 
                         value[str_detect(key, 'bully')]))) %>% 
   right_join(df1)
# A tibble: 10 x 7
#      ID num_both friend_1 friend_2 friend_3 bully_1 bully_2
#   <int>    <int>    <int>    <int>    <int>   <int>   <int>
# 1     1        1        4       12        7      12      15
# 2     2        0        8        6        7      18      20
# 3     3        1        9       18        1       2       1
# 4     4        1       15        7        2       7      13
# 5     5        2        1       17        9      17       1
# 6     6        0        9       19       20      14      12
# 7     7        1       19       12       20       9      12
# 8     8        0        7        1       16       2      15
# 9     9        1        1       10       12       1       7
#10    10        2        7       11        9      11       7

数据
df1嘿，谢谢！我认为应用版本几乎就在那里了，但它是按行计算NA的，其中一些观察结果的提名比其他的少。知道如何确保在计算匹配项时忽略NA吗？我们可以使用sum
和is.NA
忽略NA匹配项。我已经更新了答案。
library(data.table)
setDT(df1)[melt(df1, measure = patterns('^friend', '^bully'))[,
   .(num_both = length(intersect(value1, value2))), ID], on = .(ID)]
#    ID friend_1 friend_2 friend_3 bully_1 bully_2 num_both
# 1:  1        4       12        7      12      15        1
# 2:  2        8        6        7      18      20        0
# 3:  3        9       18        1       2       1        1
# 4:  4       15        7        2       7      13        1
# 5:  5        1       17        9      17       1        2
# 6:  6        9       19       20      14      12        0
# 7:  7       19       12       20       9      12        1
# 8:  8        7        1       16       2      15        0
# 9:  9        1       10       12       1       7        1
#10: 10        7       11        9      11       7        2

library(tidyverse)
df1 %>% 
   gather(key, value, -ID) %>% 
   group_by(ID) %>% 
   summarise(num_both = length(intersect(value[str_detect(key, 'friend')], 
                         value[str_detect(key, 'bully')]))) %>% 
   right_join(df1)
# A tibble: 10 x 7
#      ID num_both friend_1 friend_2 friend_3 bully_1 bully_2
#   <int>    <int>    <int>    <int>    <int>   <int>   <int>
# 1     1        1        4       12        7      12      15
# 2     2        0        8        6        7      18      20
# 3     3        1        9       18        1       2       1
# 4     4        1       15        7        2       7      13
# 5     5        2        1       17        9      17       1
# 6     6        0        9       19       20      14      12
# 7     7        1       19       12       20       9      12
# 8     8        0        7        1       16       2      15
# 9     9        1        1       10       12       1       7
#10    10        2        7       11        9      11       7

df1 %>% 
     mutate(num_both = pmap(.[-1], ~ c(...) %>%
                                 {length(intersect(.[1:3], .[4:5]))}))

df1 <- structure(list(ID = 1:10, friend_1 = c(4L, 8L, 9L, 15L, 1L, 9L, 
19L, 7L, 1L, 7L), friend_2 = c(12L, 6L, 18L, 7L, 17L, 19L, 12L, 
1L, 10L, 11L), friend_3 = c(7L, 7L, 1L, 2L, 9L, 20L, 20L, 16L, 
12L, 9L), bully_1 = c(12L, 18L, 2L, 7L, 17L, 14L, 9L, 2L, 1L, 
11L), bully_2 = c(15L, 20L, 1L, 13L, 1L, 12L, 12L, 15L, 7L, 7L
)), class = "data.frame", row.names = c(NA, -10L))