R 字符列之间的对应关系_R_Dplyr_Correspondence

R 字符列之间的对应关系

R 字符列之间的对应关系,r,dplyr,correspondence,R,Dplyr,Correspondence,我有一个包含五个字符列的数据框。每列都有数量有限的值（分类数据）。在数据集中，一列中的每个值出现的次数可变，而其他列中的其他值出现的次数可变以下是一个示例数据集： d<- structure(list(ID = c(17, 12, 12, 17, 17, 12, 12, 17, 31, 13), card = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3), curf = c("c11", "c11", &quo

我有一个包含五个字符列的数据框。每列都有数量有限的值（分类数据）。在数据集中，一列中的每个值出现的次数可变，而其他列中的其他值出现的次数可变

以下是一个示例数据集：

d<- structure(list(ID = c(17, 12, 12, 17, 17, 12, 12, 17, 31, 13), 
    card = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3), curf = c("c11", "c11", 
    "c11", "c11", "c12", "c12", "c12", "c12", "c08", "c08"), 
    mas = c("m2_indo", "m2_indo", "m2_indo", "m2_indo", "m2_indo", 
    "m2_indo", "m2_indo", "m2_indo", "m3_every", "m3_every"), 
    vac = c("v_100", "v_100", "v_100", "v_100", "v_200", "v_200", 
    "v_200", "v_200", "v_100", "v_100"), scho = c("s_nope", "s_nope", 
    "s_nope", "s_nope", "s_50", "s_50", "s_50", "s_50", "s_nope", 
    "s_nope"), alco = c("a3_nsol", "a3_nsol", "a3_nsol", "a3_nsol", 
    "a2_thu", "a2_thu", "a2_thu", "a2_thu", "a1_sat", "a1_sat"
    )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

      ID  card curf  mas      vac   scho   alco   
   <dbl> <dbl> <chr> <chr>    <chr> <chr>  <chr>  
 1    17     1 c11   m2_indo  v_100 s_nope a3_nsol
 2    12     1 c11   m2_indo  v_100 s_nope a3_nsol
 3    12     1 c11   m2_indo  v_100 s_nope a3_nsol
 4    17     1 c11   m2_indo  v_100 s_nope a3_nsol
 5    17     2 c12   m2_indo  v_200 s_50   a2_thu 
 6    12     2 c12   m2_indo  v_200 s_50   a2_thu 
 7    12     2 c12   m2_indo  v_200 s_50   a2_thu 
 8    17     2 c12   m2_indo  v_200 s_50   a2_thu 
 9    31     3 c08   m3_every v_100 s_nope a1_sat 
10    13     3 c08   m3_every v_100 s_nope a1_sat

我看不到任何合理的计算策略？

这里有一种方法可以一次完成所有字符列的计算，而不需要事先知道列的名称

long1 <- d %>% 
  mutate(Row=row_number()) %>% 
  pivot_longer(cols=where(is.character), names_to="Col1", values_to="Value1")
long2 <- d %>% 
  mutate(Row=row_number()) %>% 
  pivot_longer(cols=where(is.character), names_to="Col2", values_to="Value2")

long1 %>% 
  left_join(long2, by="Row") %>% 
  filter(Col1 != Col2) %>% group_by(Value1, Value2) %>% 
  summarise(N=n(), .groups="drop")
# A tibble: 58 x 3
   Value1  Value2       N
 * <chr>   <chr>    <int>
 1 a1_sat  c08          2
 2 a1_sat  m3_every     2
 3 a1_sat  s_nope       2
 4 a1_sat  v_100        2
 5 a2_thu  c12          4
 6 a2_thu  m2_indo      4
 7 a2_thu  s_50         4
 8 a2_thu  v_200        4
 9 a3_nsol c11          4
10 a3_nsol m2_indo      4
# … with 48 more rows

long1%
变异（行=行编号（））%>%
pivot_longer（cols=where（is.character），name_to=“Col1”，values_to=“Value1”）
长2%
变异（行=行编号（））%>%
pivot_longer（cols=where（is.character），name_to=“Col2”，values_to=“Value2”）
long1%>%
左联接（long2，by=“Row”）%>%
过滤器（Col1！=Col2）%%>%group_by（Value1，Value2）%%
总结（N=N（），.groups=“drop”）
#A tibble:58 x 3
值1值2 N
*        
1 a1_sat c08 2
每2天2个a1_sat m3_
3 A 1_sat s_nope 2
4 a1_sat v_100 2
5 a2_thu c12 4
6 a2_thu m2_indo 4
7 a2_thu s_50 4
8 a2_thu v_200 4
9 A 3\u nsol c11 4
10平方米印度4
#…还有48行

正是我想要的。不确定是否理解summary函数的.groups=“drop”参数的效果？这只是为了避免出现来自

summary

的警告消息。没有它，

summary

会说一些类似于“你没有告诉我该做什么，所以我在做一个假设”的话。这是因为

summary

影响已分组TIBLE的分组方式。

long1 <- d %>% 
  mutate(Row=row_number()) %>% 
  pivot_longer(cols=where(is.character), names_to="Col1", values_to="Value1")
long2 <- d %>% 
  mutate(Row=row_number()) %>% 
  pivot_longer(cols=where(is.character), names_to="Col2", values_to="Value2")

long1 %>% 
  left_join(long2, by="Row") %>% 
  filter(Col1 != Col2) %>% group_by(Value1, Value2) %>% 
  summarise(N=n(), .groups="drop")
# A tibble: 58 x 3
   Value1  Value2       N
 * <chr>   <chr>    <int>
 1 a1_sat  c08          2
 2 a1_sat  m3_every     2
 3 a1_sat  s_nope       2
 4 a1_sat  v_100        2
 5 a2_thu  c12          4
 6 a2_thu  m2_indo      4
 7 a2_thu  s_50         4
 8 a2_thu  v_200        4
 9 a3_nsol c11          4
10 a3_nsol m2_indo      4
# … with 48 more rows