R 如何转换数据以查找具有相同值的索引
我打算寻找购买完全相同产品的客户 我掌握的数据是客户的行为和他们购买的东西。 我提供的示例是我的数据的简化版本。客户通常会购买10到20种产品。消费者可以选择购买大约50种产品 我真的很困惑什么是将数据转换为我喜欢的输出的简单方法。 你能给我一些建议吗?谢谢 输入: 输出:R 如何转换数据以查找具有相同值的索引,r,data-manipulation,data-cleaning,data-processing,R,Data Manipulation,Data Cleaning,Data Processing,我打算寻找购买完全相同产品的客户 我掌握的数据是客户的行为和他们购买的东西。 我提供的示例是我的数据的简化版本。客户通常会购买10到20种产品。消费者可以选择购买大约50种产品 我真的很困惑什么是将数据转换为我喜欢的输出的简单方法。 你能给我一些建议吗?谢谢 输入: 输出: 我怀疑您是否希望以更有用的方式来构建数据。在任何情况下,tidyverse都是一种有助于思考任务的方式 如前所述,为其他人发布代码可以节省他们的时间,更快地得到答案 library(dplyr) library(strin
我怀疑您是否希望以更有用的方式来构建数据。在任何情况下,tidyverse都是一种有助于思考任务的方式 如前所述,为其他人发布代码可以节省他们的时间,更快地得到答案
library(dplyr)
library(stringr)
library(tidyr)
d <- data_frame(id=c(1,2,3,4,5,6)
, bought=c('Apple, Beer, Diaper','Apple, Beer', 'Apple, Beer, Diaper, Diaper'
, 'Apple, Diaper', 'Diaper, Apple', 'Apple, Diaper, Beer, Beer'))
d %>%
## Unnest the values & take care of white space
## - This is the better data structure to have, anyways
mutate(buy=str_split(bought,',')) %>%
unnest(buy) %>% mutate(buy=str_trim(buy)) %>% select(-bought) %>%
## Get distinct (and sort?)
distinct(id, buy) %>% arrange(id, buy) %>%
## Aggregate by id
group_by(id) %>% summarize(bought=paste(buy,collapse=', ')) %>% ungroup %>%
## Count
group_by(bought) %>% summarize(ids=paste(id,collapse=',')) %>% ungroup
编辑:使用给定的输入数据和data.table,在dplyr中更快/更清晰地获得不同组合的引用,这可以写成一行:
dcast(unique(setDT(input)[, strsplit(Products, ", "), Customer_ID])[
order(Customer_ID, V1)],
Customer_ID ~ ., paste, collapse = ", ")[
, .(Customers = paste(Customer_ID, collapse = ", ")), .(Products = .)]
# Products Customers
#1: Apple, Beer, Diaper 1, 3, 6
#2: Apple, Beer 2
#3: Apple, Diaper 4, 5
请注意,OP已从中删除了第二行,其中只有一个客户
预期的输出,但在问题中没有提到筛选输出的任何标准
输入数据
由OP给出:
input <- structure(list(Customer_ID = 1:6, Products = c("Apple, Beer, Diaper",
"Beer, Apple", "Beer, Apple, Diaper, Diaper", "Apple, Diaper",
"Diaper, Apple", "Apple, Diaper, Beer, Beer")), .Names = c("Customer_ID",
"Products"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L), spec = structure(list(cols = structure(list(Customer_ID = structure(list(), class = c("collector_integer",
"collector")), Products = structure(list(), class = c("collector_character",
"collector"))), .Names = c("Customer_ID", "Products")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
请使用dput显示一个小的可再现图像,以及基于该图像而非图像的预期输出
dcast(unique(setDT(input)[, strsplit(Products, ", "), Customer_ID])[
order(Customer_ID, V1)],
Customer_ID ~ ., paste, collapse = ", ")[
, .(Customers = paste(Customer_ID, collapse = ", ")), .(Products = .)]
# Products Customers
#1: Apple, Beer, Diaper 1, 3, 6
#2: Apple, Beer 2
#3: Apple, Diaper 4, 5
input <- structure(list(Customer_ID = 1:6, Products = c("Apple, Beer, Diaper",
"Beer, Apple", "Beer, Apple, Diaper, Diaper", "Apple, Diaper",
"Diaper, Apple", "Apple, Diaper, Beer, Beer")), .Names = c("Customer_ID",
"Products"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L), spec = structure(list(cols = structure(list(Customer_ID = structure(list(), class = c("collector_integer",
"collector")), Products = structure(list(), class = c("collector_character",
"collector"))), .Names = c("Customer_ID", "Products")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))