R 如何重新定义不同列大小和不同名称的数据表

R 如何重新定义不同列大小和不同名称的数据表,r,data.table,reshape,rbind,rbindlist,R,Data.table,Reshape,Rbind,Rbindlist,我查了类似的条目,所以没有一个能准确回答我的问题 我的问题是: 比如说,User1有6次购买,User2有2次。 购买数据如下所示: set.seed(1234) purchase <- data.frame(id = c(rep("User1", 6), rep("User2", 2)), purchaseid = sample(seq(1, 100, 1), 8), purchaseDate =

我查了类似的条目,所以没有一个能准确回答我的问题

我的问题是: 比如说,User1有6次购买,User2有2次。 购买数据如下所示:

set.seed(1234)
purchase <- data.frame(id = c(rep("User1", 6), rep("User2", 2)),
                       purchaseid = sample(seq(1, 100, 1), 8),
                       purchaseDate = seq(Sys.Date(), Sys.Date() + 7, 1),
                       price = sample(seq(30, 200, 10), 8))
#
users <- data.frame(id = c("User1","User2"),
                    uname = c("name1", "name2"),
                    uaddress = c("add1", "add2"))
  id uname uaddr p1id     p1date p1price p2id     p2date p2price p3id     p3date p3price p4id
1 User1 name1  add1   12 2019-09-27     140   62 2019-09-28     110   60 2019-09-29     200   61
2 User2 name2  add2    1 2019-10-03     160   22 2019-10-04     120   NA       <NA>      NA   NA
      p4date p4price
1 2019-09-30     190
2       <NA>      NA
所需的最终数据包括每个用户的1行,其中保存用户名、地址等。然后是20次购买的下一列。采购数据需要依次放在同一行中。这是规则:每个用户只有一行。如果用户没有20次购买,则剩余字段应为空

因此,最终数据应如下所示:

set.seed(1234)
purchase <- data.frame(id = c(rep("User1", 6), rep("User2", 2)),
                       purchaseid = sample(seq(1, 100, 1), 8),
                       purchaseDate = seq(Sys.Date(), Sys.Date() + 7, 1),
                       price = sample(seq(30, 200, 10), 8))
#
users <- data.frame(id = c("User1","User2"),
                    uname = c("name1", "name2"),
                    uaddress = c("add1", "add2"))
  id uname uaddr p1id     p1date p1price p2id     p2date p2price p3id     p3date p3price p4id
1 User1 name1  add1   12 2019-09-27     140   62 2019-09-28     110   60 2019-09-29     200   61
2 User2 name2  add2    1 2019-10-03     160   22 2019-10-04     120   NA       <NA>      NA   NA
      p4date p4price
1 2019-09-30     190
2       <NA>      NA
id取消uaddr p1id p1date p1price p2id p2date p2price p3id p3date p3price p4id
1用户名1地址1 12 2019-09-27 140 62 2019-09-28 110 60 2019-09-29 200 61
2用户2姓名2地址2 2019-10-03 160 22 2019-10-04 120不适用
P4日期P4价格
1 2019-09-30     190
2 NA

enddata不需要单独处理每个id。相反,我们可以在单个数据帧内通过id进行操作。下面是一种
tidyverse
方法。您可以在任意点停止链以查看中间输出。我已经添加了注释来解释代码的作用,但是如果有什么不清楚的地方,请告诉我

library(tidyverse)

dat = users %>% 
  # Join purchase data to user data
  left_join(purchase) %>% 
  arrange(purchaseDate) %>% 
  # Create a count column to assign a sequence number to each purchase within each id.
  # We'll use this later to create columns for each purchase event with a unique 
  # sequence number for each purchase.
  group_by(id) %>% 
  mutate(seq=1:n()) %>% 
  ungroup %>% 
  # Reshape data frame to from "wide" to "long" format
  gather(key, value, purchaseid:price) %>% 
  arrange(seq) %>% 
  # Paste together the "key" and "seq" columns (the resulting column will still be 
  # called "key"). This will allow us to spread the data frame to one row per id 
  # with each purchase event properly numbered.
  unite(key, key, seq, sep="_") %>% 
  mutate(key = factor(key, levels=unique(key))) %>% 
  spread(key, value) %>% 
  # Convert date columns back to Date class
  mutate_at(vars(matches("Date")), as.Date, origin="1970-01-01")

dat
id uname uaddress purchaseid\u 1 purchaseDate\u 1 price\u 1 purchaseid\u 2 purchaseDate\u 2 price\u 2
1用户名1地址1 12 2019-09-27 140 62 2019-09-28 110
用户姓名地址2019-10-03 160 22 2019-10-04 120
purchaseid\u 3 purchaseDate\u 3 price\u 3 purchaseid\u 4 purchaseDate\u 4 price\u 4 purchaseid\u 5 purchaseDate\u 5
1           60     2019-09-29     200           61     2019-09-30     190           83     2019-10-01
2钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠
价格\u 5购买ID \u 6购买日期\u 6价格\u 6
1      60           97     2019-10-02     150
2娜娜娜娜

另一个使用
数据的选项。表

#pivot to wide format
setDT(users)
setDT(purchase)[, pno := rowid(id)]
ans <- dcast(purchase[users, on=.(id)], id + uname + uaddress ~ pno, 
    value.var=c("purchaseid","purchaseDate", "price"))

#reorder columns
nm <- grep("[1-9]$", names(ans), value=TRUE)
setcolorder(ans, c(setdiff(names(ans), nm), nm[order(gsub("(.*)_", "", nm))]))
ans
#转向宽格式
setDT(用户)
setDT(购买)[,pno:=rowid(id)]
ans
#pivot to wide format
setDT(users)
setDT(purchase)[, pno := rowid(id)]
ans <- dcast(purchase[users, on=.(id)], id + uname + uaddress ~ pno, 
    value.var=c("purchaseid","purchaseDate", "price"))

#reorder columns
nm <- grep("[1-9]$", names(ans), value=TRUE)
setcolorder(ans, c(setdiff(names(ans), nm), nm[order(gsub("(.*)_", "", nm))]))
ans
      id uname uaddress purchaseid_1 purchaseDate_1 price_1 purchaseid_2 purchaseDate_2 price_2 purchaseid_3 purchaseDate_3 price_3 purchaseid_4 purchaseDate_4 price_4 purchaseid_5 purchaseDate_5 price_5 purchaseid_6 purchaseDate_6 price_6
1: User1 name1     add1           12     2019-09-30     140           62     2019-10-01     110           60     2019-10-02     200           61     2019-10-03     190           83     2019-10-04      60           97     2019-10-05     150
2: User2 name2     add2            1     2019-10-06     160           22     2019-10-07     120           NA           <NA>      NA           NA           <NA>      NA           NA           <NA>      NA           NA           <NA>      NA