R GG按填充和显示平均值绘制分组
我正在制作热图,并遵循本教程: 要保存单击,以下是要复制的代码块:R GG按填充和显示平均值绘制分组,r,ggplot2,heatmap,R,Ggplot2,Heatmap,我正在制作热图,并遵循本教程: 要保存单击,以下是要复制的代码块: library(ggplot2) library(dplyr) # easier data wrangling library(viridis) # colour blind friendly palette, works in B&W also library(Interpol.T) # will generate a large dataset on initial load library(lubridate
library(ggplot2)
library(dplyr) # easier data wrangling
library(viridis) # colour blind friendly palette, works in B&W also
library(Interpol.T) # will generate a large dataset on initial load
library(lubridate) # for easy date manipulation
library(ggExtra) # because remembering ggplot theme options is beyond me
library(tidyr)
data<- data(Trentino_hourly_T,package = "Interpol.T")
names(h_d_t)[1:5]<- c("stationid","date","hour","temp","flag")
df<- tbl_df(h_d_t) %>%
filter(stationid =="T0001")
df<- df %>% mutate(year = year(date),
month = month(date, label=TRUE),
day = day(date))
df$date<-ymd(df$date) # not necessary for plot but
#useful if you want to do further work with the data
#cleanup
rm(list=c("h_d_t","mo_bias","Tn","Tx",
"Th_int_list","calibration_l",
"calibration_shape","Tm_list"))
#create plotting df
df <-df %>% select(stationid,day,hour,month,year,temp)
看起来是这样的:
> head(sam)
Day Hour sessionID uniquePageviews timeOnPage
20219 Wed 18 1508980591045.l027p6mt 1 359
42612 Wed 7 1510155616668.57i2wj1 1 149
42149 Wed 3 1510140439620.qu19kyo 1 69
46707 Thurs 22 1510296404412.xasqfwqd10v1qdtl6jemi 1 146
40122 Tues 11 1510082622485.szj2ja1e 1 147
57449 Mon 11 1511204933263.mq9bvi0d 1 119
> glimpse(sam)
Observations: 100
Variables: 5
$ Day <ord> Wed, Wed, Wed, Thurs, Tues, Mon, Tues, Fri, Mon, Mon, Wed, Mon, Tues, Tues, Fri, Sun, Wed, M...
$ Hour <int> 18, 7, 3, 22, 11, 11, 9, 16, 16, 13, 18, 18, 10, 19, 7, 13, 18, 14, 10, 20, 17, 6, 21, 15, 1...
$ sessionID <chr> "1508980591045.l027p6mt", "1510155616668.57i2wj1", "1510140439620.qu19kyo", "1510296404412.x...
$ uniquePageviews <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ timeOnPage <dbl> 359, 149, 69, 146, 147, 119, 168, 69, 29, 0, 1542, 148, 242, 49, 457, 175, 175, 97, 79, 12, ...
>头部(sam)
日时会话ID uniquePageviews timeOnPage
20219星期三18 1508980591045.l027p6mt 1 359
42612星期三7 1510155616668.57i2wj1 1 149
42149星期三1510140439620.qu19kyo 1 69
46707星期四22 1510296404412.xasqfwqd10v1qdtl6jemi 1 146
40122星期二11 1510082622485.szj2ja1e 1 147
57449周一11 1511204933263.mq9bvi0d 1 119
>一瞥(山姆)
意见:100
变量:5
星期三,星期三,星期三,星期四,星期二,星期一,星期二,星期五,星期一,星期一,星期三,星期一,星期二,星期五,星期日,星期三,星期三。。。
$Hour 18,7,3,22,11,11,9,16,16,13,18,18,10,19,7,13,18,14,10,20,17,6,21,15,1。。。
$sessionID“1508980591045.l027p6mt”、“1510155616668.57i2wj1”、“1510140439620.qu19kyo”、“1510296404412.x…”。。。
$uniquePageviews 1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,。。。
$timeOnPage 359、149、69、146、147、119、168、69、29、0、1542、148、242、49、457、175、175、97、79、12。。。
Metric uniquePageviews将始终为1或o,并且在热图中它看起来不太好。由于它是会话级别的数据,因此每天/小时有多个条目。对于timeOnPage,我希望热图显示给定小时和星期几组合的页面上的平均时间
所以,据我所知,ggplot是对所有内容求和,而我想要的是mean()
我的初始代码块:
# creates the initial heatmap
p <- ggplot(sam, aes(x = Day, y = Hour, fill = uniquePageviews)) +
geom_tile(color = "white", size = 0.1) +
scale_fill_viridis(name = "TimeOnPage", option ="C")
# order by hour of day going top to bottom asc
p <-p + scale_y_continuous(trans = "reverse", breaks = unique(df$hour))
#创建初始热图
p不确定我是否了解您的问题,但您可以尝试以下方法:
library(tidyverse)
library(viridis)
d %>%
group_by(Day, Hour) %>%
summarise(Mean=mean(timeOnPage)) %>%
ggplot(aes(x = Day, y = Hour, fill = Mean)) +
geom_tile(color = "white", size = 0.1) +
scale_fill_viridis(name = "TimeOnPage", option ="C")
这将计算每天和每小时的平均时间,并将其绘制为热图。不确定我是否了解您的问题,但您可以尝试以下操作:
library(tidyverse)
library(viridis)
d %>%
group_by(Day, Hour) %>%
summarise(Mean=mean(timeOnPage)) %>%
ggplot(aes(x = Day, y = Hour, fill = Mean)) +
geom_tile(color = "white", size = 0.1) +
scale_fill_viridis(name = "TimeOnPage", option ="C")
这将计算每天和每小时的平均timeOnPage
,并将其绘制为热图。fill=timeOnPage
?可以是uniquePageviews,也可以是timeOnPage。它们都是潜在的指标。事后看来,我应该只包含一个指标,以使问题最小化。或者,或者,同样的问题,我想知道是否有一种方法可以在没有mani的情况下进行分组在传递到GGPLOT之前填充数据帧可能重复的fill=timeOnPage
?uniquePageviews或timeOnPage。它们都是潜在的指标。事后看来,我应该只包含一个指标,以使问题最小化。或者,或者,相同的问题,我想知道是否有一种方法可以在pas之前分组而不操纵数据帧唱给ggplot可能重复的是的,这很有效。我感兴趣的是ggplot是否可以处理分组,而不是分组+总结。我认为可以,但我想这样做更简单是的,这很有效。我感兴趣的是ggplot是否可以处理分组,而不是分组+总结。我想可以,但我这样做比较简单
# gets the initial heatmap
p <- ggplot(sam, aes(x = Day, y = Hour, fill = uniquePageviews),
stat = "summary", fun.y = "mean") +
geom_tile(color = "white", size = 0.1) +
scale_fill_viridis(name = "Mean TimeOnPage", option ="C")
# order by hour of day going top to bottom asc
p <-p + scale_y_continuous(trans = "reverse", breaks = unique(df$hour))
library(tidyverse)
library(viridis)
d %>%
group_by(Day, Hour) %>%
summarise(Mean=mean(timeOnPage)) %>%
ggplot(aes(x = Day, y = Hour, fill = Mean)) +
geom_tile(color = "white", size = 0.1) +
scale_fill_viridis(name = "TimeOnPage", option ="C")