R 使用'ddply'从两列中根据匹配对选择列的最常用值`_R_Plyr_Aggregation_Pairwise

R 使用'ddply'从两列中根据匹配对选择列的最常用值`

R 使用'ddply'从两列中根据匹配对选择列的最常用值`,r,plyr,aggregation,pairwise,R,Plyr,Aggregation,Pairwise,我正在尝试使用ddply plyr函数来排序和识别以下形式的社交媒体数据中任何唯一用户对之间最频繁的交互类型 from <- c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D') to <- c('B', 'B', 'D', 'A', 'C', 'C', 'D', 'A', 'D', 'B', 'A', 'B', 'B', 'A', 'C') interaction_type <

我正在尝试使用ddply plyr函数来排序和识别以下形式的社交媒体数据中任何唯一用户对之间最频繁的交互类型

from <- c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D')
to <- c('B', 'B', 'D', 'A', 'C', 'C', 'D', 'A', 'D', 'B', 'A', 'B', 'B', 'A', 'C')
interaction_type <- c('like', 'comment', 'share', 'like', 'like', 'like', 'comment', 'like', 'like', 'share', 'like', 'comment', 'like', 'share', 'like')

dat <- data.frame(from, to, interaction_type)

而通过使用

count <- ddply(sub_test, .(from, to), nrow)

我发现用这种聚合方法很难应用类似的方法来找到任何给定对之间最常见的交互类型。实现预期产出的最有效方式是什么？另外，如何处理可能的捆绑案例？我可能只使用tied作为所有绑定案例的单元格值。

我们需要找到每个组最常见的值模式，而不管列的顺序从、到

从应答中获取模式函数

通过在数据中添加stringsAsFactors=FALSE将列保留为字符

我们需要找到每组中最常见的值模式，而不考虑列的顺序从、到

从应答中获取模式函数

通过在数据中添加stringsAsFactors=FALSE将列保留为字符

类似于Ronak的方法

library(dplyr)
dat <- data.frame(from, to, interaction_type, stringsAsFactors = F)
dat %>% 
  mutate(
    pair = purrr::pmap_chr(
      .l = list(from = from, to = to),
      .f = function(from, to) paste(sort(c(from, to)), collapse = "")
    )
  ) %>%
  group_by(pair) %>%
  filter(n() == max(n()) & row_number() == 1) %>%
  ungroup() %>%
  select(-pair)
# A tibble: 6 x 3
  from  to    interaction_type
  <chr> <chr> <chr>           
1 A     B     like            
2 A     D     share           
3 B     C     like            
4 B     D     comment         
5 C     A     like            
6 C     D     like

与Ronak的方法类似

library(dplyr)
dat <- data.frame(from, to, interaction_type, stringsAsFactors = F)
dat %>% 
  mutate(
    pair = purrr::pmap_chr(
      .l = list(from = from, to = to),
      .f = function(from, to) paste(sort(c(from, to)), collapse = "")
    )
  ) %>%
  group_by(pair) %>%
  filter(n() == max(n()) & row_number() == 1) %>%
  ungroup() %>%
  select(-pair)
# A tibble: 6 x 3
  from  to    interaction_type
  <chr> <chr> <chr>           
1 A     B     like            
2 A     D     share           
3 B     C     like            
4 B     D     comment         
5 C     A     like            
6 C     D     like

非常感谢，这是在第三列中获得所需最大值的非常有效的方法。非常感谢，这是在第三列中获得所需最大值的非常有效的方法。感谢stringsAsFactors提示！谢谢你的提示！

library(dplyr)

dat %>%
  mutate(key = paste0(pmin(from, to), pmax(from, to), sep = "")) %>%
  group_by(key) %>%
  mutate(interaction_type = Mode(interaction_type)) %>%
  slice(1) %>%
  ungroup() %>%
  select(-key)

#  from  to    interaction_type
#  <chr> <chr> <chr>           
#1 A     B     like            
#2 C     A     like            
#3 A     D     share           
#4 B     C     like            
#5 B     D     comment         
#6 C     D     like

library(dplyr)
dat <- data.frame(from, to, interaction_type, stringsAsFactors = F)
dat %>% 
  mutate(
    pair = purrr::pmap_chr(
      .l = list(from = from, to = to),
      .f = function(from, to) paste(sort(c(from, to)), collapse = "")
    )
  ) %>%
  group_by(pair) %>%
  filter(n() == max(n()) & row_number() == 1) %>%
  ungroup() %>%
  select(-pair)
# A tibble: 6 x 3
  from  to    interaction_type
  <chr> <chr> <chr>           
1 A     B     like            
2 A     D     share           
3 B     C     like            
4 B     D     comment         
5 C     A     like            
6 C     D     like