在R中创建数据透视表和分组字段

在R中创建数据透视表和分组字段,r,excel,algorithm,R,Excel,Algorithm,我正在尝试使用R为我的excel数据集构建一个透视表。我需要对数字进行分组(在一个名为“权重”的列中,权重范围为70-100。每个权重都有一个价格。我需要找到每个权重类别中的平均值(权重)、最大值(权重)、最小值(权重)和产品数。25个变量中约有3000个obs。权重和价格是其中的两个。 数据片段: Weight Price Order No. Date_Ordered Invoiced_Date Region 85 $2300 78 $5600

我正在尝试使用R为我的excel数据集构建一个透视表。我需要对数字进行分组(在一个名为“权重”的列中,权重范围为70-100。每个权重都有一个价格。我需要找到每个权重类别中的平均值(权重)、最大值(权重)、最小值(权重)和产品数。25个变量中约有3000个obs。权重和价格是其中的两个。 数据片段:

Weight   Price   Order No.   Date_Ordered   Invoiced_Date   Region  
85       $2300 
78       $5600
100      $3490
95       $2450
90       $5890

I am looking for something like:
    Weight                       Count    Mean(Price)   Min(Price)   Max(Price)
70-75(including 75)     
75-80
80-85
85-90
90-95
95-100
我能够获得计数,但无法获得每个重量类别的平均值、最小值和最大值:

#Import the dataset
dataset = read.xlsx('Product_Data.xlsx')
gdataset <- group_by(dataset, Weight)
attach(gdataset)
periods <- seq(from = 70, to = 100, by 5)
snip < -cut(Weight, breaks = periods, right = TRUE, include.lowest = TRUE)
report <- cbind(table(snip))
#导入数据集
dataset=read.xlsx('Product_Data.xlsx'))

gdataset您的数据有点稀疏,因此我将为这个答案创建自己的数据。我将忽略其他列,尽管数据中的存在不会影响任何内容

set.seed(2)
n <- 100
dat <- data.frame(
  Weight = sample(100, size=n, replace=TRUE),
  Price = sample(9999, size=n, replace=TRUE)
)
head(dat)
#   Weight Price
# 1     19  2010
# 2     71  4276
# 3     58  9806
# 4     17  8289
# 5     95  2870
# 6     95  5959
现在,我们只需将其分成几个组,并对每个组运行一个简单的摘要功能,将其包装回一个
数据框中

do.call(rbind, by(dat$Price, dat$WeightBin, function(x) {
  setNames(
    sapply(c(length, mean, min, max), function(f) f(x)),
    c("Count", "Mean(Price)", "Min(Price)", "Max(Price)")
  )
}))
#          Count Mean(Price) Min(Price) Max(Price)
# (0,5]        5    3919.000       1822       9536
# (5,10]       3    4287.000       1782       5690
# (10,15]      5    5402.200       2739       8989
# (15,20]     11    5192.545       1183       9192
# (20,25]      3    2868.667        137       7363
# (25,30]      6    6594.500       2855       9657
# (30,35]      5    2960.200        777       7486
# (35,40]      6    4937.000        850       9749
# (40,45]      7    5986.000       1307       9527
# (45,50]      4    5957.750       1475       9754
# (50,55]      3    3077.333       1287       4786
# (55,60]      4    4285.500        247       9806
# (60,65]      3    2633.000        450       6656
# (65,70]      4    4244.250        369       9038
# (70,75]      3    2616.333        652       4276
# (75,80]      5    7183.800       3734       8537
# (80,85]      6    4273.667        229       9788
# (85,90]      6    6659.000       1388       9637
# (90,95]      4    4301.750       2870       5959
# (95,100]     7    3967.857        872       8727
dplyr
我从存在的
groupby
推断出您打算使用
dplyr
。以下是获得类似结果的替代方法(从我的原始数据开始):


你好,欢迎来到堆栈溢出。为了帮助其他人回答你的问题,请考虑编辑它来添加一个最小的可重复的例子。
do.call(rbind, by(dat$Price, dat$WeightBin, function(x) {
  setNames(
    sapply(c(length, mean, min, max), function(f) f(x)),
    c("Count", "Mean(Price)", "Min(Price)", "Max(Price)")
  )
}))
#          Count Mean(Price) Min(Price) Max(Price)
# (0,5]        5    3919.000       1822       9536
# (5,10]       3    4287.000       1782       5690
# (10,15]      5    5402.200       2739       8989
# (15,20]     11    5192.545       1183       9192
# (20,25]      3    2868.667        137       7363
# (25,30]      6    6594.500       2855       9657
# (30,35]      5    2960.200        777       7486
# (35,40]      6    4937.000        850       9749
# (40,45]      7    5986.000       1307       9527
# (45,50]      4    5957.750       1475       9754
# (50,55]      3    3077.333       1287       4786
# (55,60]      4    4285.500        247       9806
# (60,65]      3    2633.000        450       6656
# (65,70]      4    4244.250        369       9038
# (70,75]      3    2616.333        652       4276
# (75,80]      5    7183.800       3734       8537
# (80,85]      6    4273.667        229       9788
# (85,90]      6    6659.000       1388       9637
# (90,95]      4    4301.750       2870       5959
# (95,100]     7    3967.857        872       8727
library(dplyr)
dat %>%
  group_by(Bin = cut(Weight, seq(0, 100, by=5))) %>%
  summarize(
    Count = n(),
    Mean = mean(Price),
    Min = min(Price),
    Max = max(Price)
  )