r中的聚合函数不适用于我的数据集
样本数据集 Date Playerid Revenue Promo DayofWeek 01/01/2017 146123 0 B Sunday 01/01/2017 219378 0 B Sunday 01/01/2017 198614 0 B Sunday 02/01/2017 292640 30 A Monday 02/01/2017 139562 10 A Monday 02/01/2017 124967 20 A Monday 02/01/2017 107954 20 A Monday 03/01/2017 28391 10 B Tuesday 03/01/2017 184388 21 B Tuesday 03/01/2017 264222 20 B Tuesday 03/01/2017 184857 0 B Tuesday 04/01/2017 79788 40 A Wednesdayr中的聚合函数不适用于我的数据集,r,aggregate-functions,R,Aggregate Functions,样本数据集 Date Playerid Revenue Promo DayofWeek 01/01/2017 146123 0 B Sunday 01/01/2017 219378 0 B Sunday 01/01/2017 198614 0 B Sunday 02/01/2017 292640 30 A Monday 02/01/2017 139562 10 A Monday 02/01/2017 124967 20
这是因为除了
Date
之外,您正在按所有项进行聚合,因此sum
函数试图将这些日期字符串相加。试着这样做:
Players Revenue Promo DayofWeek
3 0 B Sunday
4 80 A Monday
4 51 B Tuesday
1 40 A Wednesday
aggdata <-aggregate(MyData, by=list(DayofWeek, Date, Promo, Playerid),
FUN=sum, na.rm=TRUE)
aggdatadplyr
方法
library(dplyr)
ans <- df %>%
group_by(DayofWeek) %>%
summarise(Promo=unique(Promo), Revenue=sum(Revenue), Playerid=n())
库(dplyr)
ans%
分组人(星期一)%>%
总结(Promo=unique(Promo),Revenue=sum(Revenue),Playerid=n())
输出
DayofWeek Promo Revenue Playerid
<chr> <chr> <int> <int>
1 Monday A 80 4
2 Sunday B 0 3
3 Tuesday B 51 4
4 Wednesday A 40 1
DayofWeek促销收入播放器ID
1星期一A 80 4
2星期日B 0 3
3星期二B 51 4
4星期三A 40 1
资料
df在尝试了您的建议之后,我得到了以下错误>aggdata我想按除日期之外的所有列进行聚合如果没有示例数据,很难猜测确切的语法。请编辑您的问题以包含dput(MyData)
的输出。另外,在最初显示第二个公式后,我稍微修改了一下:你在使用编辑过的代码吗?哎呀:我意识到现在你想使用MyData[,-2:5]
,而不是MyData[,-2:4]
。已修复。示例数据作为图像文件的链接附加,文件名为“Customer Dataset”,位于我的问题页面的最顶部。谢谢你的帮助,谢谢你的帮助。我尝试了您的解决方案,它对提供的小样本数据有效。但当我加载数据集时,数据集的列与我之前提供的类似,这次包含30000行,解决方案不起作用。我尝试这样做:MyData%Summary(Promo=unique(Promo),Revenue=sum(Revenue),Playerid=n())视图(ans)II假设我的问题在于加载数据的方式我得到的错误是Summary_impl(.data,dots)中的错误:列Promo
的长度必须为1(摘要值),而不是3。有关详细信息,MyData具有以下结构:str(MyData)'data.frame':303345 obs。共5个变量:$Date:chr“01/01/2017”“01/01/2017”“01/01/2017”“01/01/2017”玩家ID:int 146123 219378 28391 184388 264222…$收入:整数0 0 30 0 0 20 0 21 20…$宣传品:chr“B”“B”“B”“B”DayofWeek:chr“Sunday”“Sunday”。我还尝试了在发生错误时执行此操作,因为每个DayofWeek有多个促销值。尝试按group\u by(DayofWeek,Promo)
分组,然后汇总summary(Revenue=sum(Revenue),Playerid=n())
aggdata <-aggregate(. ~ Dayofweek + Promo + Playerid, data = MyData[,-2:5], sum)
library(dplyr)
ans <- df %>%
group_by(DayofWeek) %>%
summarise(Promo=unique(Promo), Revenue=sum(Revenue), Playerid=n())
DayofWeek Promo Revenue Playerid
<chr> <chr> <int> <int>
1 Monday A 80 4
2 Sunday B 0 3
3 Tuesday B 51 4
4 Wednesday A 40 1
df <- structure(list(Date = c("01/01/2017", "01/01/2017", "01/01/2017",
"02/01/2017", "02/01/2017", "02/01/2017", "02/01/2017", "03/01/2017",
"03/01/2017", "03/01/2017", "03/01/2017", "04/01/2017"), Playerid = c(146123L,
219378L, 198614L, 292640L, 139562L, 124967L, 107954L, 28391L,
184388L, 264222L, 184857L, 79788L), Revenue = c(0L, 0L, 0L, 30L,
10L, 20L, 20L, 10L, 21L, 20L, 0L, 40L), Promo = c("B", "B", "B",
"A", "A", "A", "A", "B", "B", "B", "B", "A"), DayofWeek = c("Sunday",
"Sunday", "Sunday", "Monday", "Monday", "Monday", "Monday", "Tuesday",
"Tuesday", "Tuesday", "Tuesday", "Wednesday")), .Names = c("Date",
"Playerid", "Revenue", "Promo", "DayofWeek"), row.names = c(NA,
-12L), class = c("data.table", "data.frame"))