我如何用R总结这些数据？_R_Crosstab

我如何用R总结这些数据？

我如何用R总结这些数据？,r,crosstab,R,Crosstab,我正在分析不同购物场所之间的顾客流。我有这样的数据： df <- data.frame(customer.id=letters[seq(1,7)], shop.1=c(1,1,1,1,1,0,0), shop.2=c(0,0,1,1,1,1,0), shop.3=c(1,0,0,0,0,0,1)) df 例如：客户a仅在1号和3号店购物客户b仅在1号店购物客户c仅在1号店和2号店

我正在分析不同购物场所之间的顾客流。我有这样的数据：

df <- data.frame(customer.id=letters[seq(1,7)], 
                 shop.1=c(1,1,1,1,1,0,0),
                 shop.2=c(0,0,1,1,1,1,0),
                 shop.3=c(1,0,0,0,0,0,1))
df

例如：

客户a仅在1号和3号店购物

客户b仅在1号店购物

客户c仅在1号店和2号店购物

等我想这样总结一下数据：

#>           shop.1 shop.2 shop.3 
#> shop.1         5      3      1
#> shop.2         3      4      0       
#> shop.3         1      0      2

例如，第1行的内容如下：

5人同时在1号店和1号店购物这显然是多余的观察 3人同时在1号店和2号店购物 1人同时在1号店和3号店购物我如何才能做到这一点请注意：我的数据集中有许多商店，因此首选可扩展的方法？

您想将商店的共现情况制成表格。*变量：

df[,2:4] <- sapply(df[,2:4], function(x) { ifelse(x=="", 0, 1) } )

2如@thelatemail所示，您还可以：

# Transform your df from wide-form to long-form...
library(dplyr)
library(reshape2)
occurrence_df <- reshape2::melt(df, id.vars='customer.id') %>%
                 dplyr::filter(value==1)

   customer.id variable value
1            a   shop.1     1
2            b   shop.1     1
3            c   shop.1     1
4            d   shop.1     1
5            e   shop.1     1
6            c   shop.2     1
7            d   shop.2     1
8            e   shop.2     1
9            f   shop.2     1
10           a   shop.3     1
11           g   shop.3     1

然后，与@thelatemail的答案相同的crossprod步骤：

crossprod(table(occurrence_df))

        variable
variable shop.1 shop.2 shop.3
  shop.1      5      3      1
  shop.2      3      4      0
  shop.3      1      0      2

脚注：

首先，数据应该是数字或因子，而不是字符串。您希望将x转换为1和0。如果它们是字符串，因为它们来自read.csv，请使用read.csv参数stringsAsFactors=TRUE使它们成为因子，或使用colClasses使它们成为数字，并查看关于这一点的所有重复问题。您想将shop的共现情况制成表格。*变量：

df[,2:4] <- sapply(df[,2:4], function(x) { ifelse(x=="", 0, 1) } )

2如@thelatemail所示，您还可以：

# Transform your df from wide-form to long-form...
library(dplyr)
library(reshape2)
occurrence_df <- reshape2::melt(df, id.vars='customer.id') %>%
                 dplyr::filter(value==1)

   customer.id variable value
1            a   shop.1     1
2            b   shop.1     1
3            c   shop.1     1
4            d   shop.1     1
5            e   shop.1     1
6            c   shop.2     1
7            d   shop.2     1
8            e   shop.2     1
9            f   shop.2     1
10           a   shop.3     1
11           g   shop.3     1

然后，与@thelatemail的答案相同的crossprod步骤：

crossprod(table(occurrence_df))

        variable
variable shop.1 shop.2 shop.3
  shop.1      5      3      1
  shop.2      3      4      0
  shop.3      1      0      2

脚注：

首先，数据应该是数字或因子，而不是字符串。您希望将x转换为1和0。如果它们是字符串，因为它们来自read.csv，请使用read.csv参数stringsAsFactors=TRUE使它们成为因子，或使用colClasses使它们成为数字，并查看关于这一点的所有重复问题。 crossprod可以处理您想要做的事情，只需进行一些基本操作，将其分为两列，分别代表customer和shop：

tmp <- cbind(df[1],stack(df[-1]))
tmp <- tmp[tmp$values==1,]

crossprod(table(tmp[c(1,3)]))

#        ind
#ind      shop.1 shop.2 shop.3
#  shop.1      5      3      1
#  shop.2      3      4      0
#  shop.3      1      0      2

crossprod可以处理您想要做的事情，只需进行一些基本操作，将其分为两列，分别代表customer和shop：

tmp <- cbind(df[1],stack(df[-1]))
tmp <- tmp[tmp$values==1,]

crossprod(table(tmp[c(1,3)]))

#        ind
#ind      shop.1 shop.2 shop.3
#  shop.1      5      3      1
#  shop.2      3      4      0
#  shop.3      1      0      2

事实上，矩阵运算似乎足够了，因为数据帧只有0和1

首先，排除customer.id列并将data.frame更改为matrix。这可能很容易。mydf是数据帧的名称

# base R way
as.matrix(mydf[,-1])
#>      shop.1 shop.2 shop.3
#> [1,]      1      0      1
#> [2,]      1      0      0
#> [3,]      1      1      0
#> [4,]      1      1      0
#> [5,]      1      1      0
#> [6,]      0      1      0
#> [7,]      0      0      1

library(dplyr) #dplyr way
(mymat <-
  mydf %>% 
  select(-customer.id) %>% 
  as.matrix())
#>      shop.1 shop.2 shop.3
#> [1,]      1      0      1
#> [2,]      1      0      0
#> [3,]      1      1      0
#> [4,]      1      1      0
#> [5,]      1      1      0
#> [6,]      0      1      0
#> [7,]      0      0      1

你可以得到答案。

事实上，矩阵运算似乎足够了，因为数据帧只有0和1

首先，排除customer.id列并将data.frame更改为matrix。这可能很容易。mydf是数据帧的名称

# base R way
as.matrix(mydf[,-1])
#>      shop.1 shop.2 shop.3
#> [1,]      1      0      1
#> [2,]      1      0      0
#> [3,]      1      1      0
#> [4,]      1      1      0
#> [5,]      1      1      0
#> [6,]      0      1      0
#> [7,]      0      0      1

library(dplyr) #dplyr way
(mymat <-
  mydf %>% 
  select(-customer.id) %>% 
  as.matrix())
#>      shop.1 shop.2 shop.3
#> [1,]      1      0      1
#> [2,]      1      0      0
#> [3,]      1      1      0
#> [4,]      1      1      0
#> [5,]      1      1      0
#> [6,]      0      1      0
#> [7,]      0      0      1

您可以得到您的答案。

谢谢，我已将示例更改为显示数字数据。但是我怎么才能得到我想要的摘要呢？Dplyr会很棒的！Base R也不错-谢谢你的提问！一旦开始数据帧是数字的，它在概念上类似于edgelist/邻接矩阵，并且您希望通过{他们购物的商店组合}来交叉列表customer.id。但我不记得如何找到那个重复的问题。。。它很难进行关键词搜索，等等……相关：谢谢，我已经更改了示例以显示数字数据。但是我怎么才能得到我想要的摘要呢？Dplyr会很棒的！Base R也不错-谢谢你的提问！一旦开始数据帧是数字的，它在概念上类似于edgelist/邻接矩阵，并且您希望通过{他们购物的商店组合}来交叉列表customer.id。但我不记得如何找到那个重复的问题。。。它对关键词搜索很有抵抗力，等等……相关：它可以用ftablextabs完成……你能想出办法吗？@smci-不能说我用过那种方法。不过我很想看看怎么做。您的第一行只是一个手动整形2:：meltdf，id.vars='customer.id'事实上，您的前两行相当于整形2:：meltdf，id.vars='customer.id'>%dplyr:：filtervalue==1@smci-绝对，或从tidyr或其他类似替代品中收集。df%>%gathershop，value，-customer.id%>%filtervalue==1。这可以用ftablextabs来完成…，你能想出办法吗？@smci-不能说我用过那种方法。不过我很想看看怎么做。您的第一行只是一个手动整形2:：meltdf，id.vars='customer.id'事实上，您的前两行相当于整形2:：meltdf，id.vars='customer.id'>%dplyr:：filtervalue==1@smci-绝对，或从tidyr或其他类似替代品中收集。df%>%gathershop，value，例如，-customer.id%>%filtervalue==1。