Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R GG按填充和显示平均值绘制分组_R_Ggplot2_Heatmap - Fatal编程技术网

R GG按填充和显示平均值绘制分组

R GG按填充和显示平均值绘制分组,r,ggplot2,heatmap,R,Ggplot2,Heatmap,我正在制作热图,并遵循本教程: 要保存单击,以下是要复制的代码块: library(ggplot2) library(dplyr) # easier data wrangling library(viridis) # colour blind friendly palette, works in B&W also library(Interpol.T) # will generate a large dataset on initial load library(lubridate

我正在制作热图,并遵循本教程:

要保存单击,以下是要复制的代码块:

library(ggplot2)
library(dplyr) # easier data wrangling 
library(viridis) # colour blind friendly palette, works in B&W also
library(Interpol.T) #  will generate a large dataset on initial load
library(lubridate) # for easy date manipulation
library(ggExtra) # because remembering ggplot theme options is beyond me
library(tidyr) 


data<- data(Trentino_hourly_T,package = "Interpol.T")

names(h_d_t)[1:5]<- c("stationid","date","hour","temp","flag")
df<- tbl_df(h_d_t) %>%
  filter(stationid =="T0001")

df<- df %>% mutate(year = year(date),
                  month = month(date, label=TRUE),
                  day = day(date))

df$date<-ymd(df$date) # not necessary for plot but 
#useful if you want to do further work with the data

#cleanup
rm(list=c("h_d_t","mo_bias","Tn","Tx",
          "Th_int_list","calibration_l",
          "calibration_shape","Tm_list"))


#create plotting df
df <-df %>% select(stationid,day,hour,month,year,temp)
看起来是这样的:

> head(sam)
        Day Hour                           sessionID uniquePageviews timeOnPage
20219   Wed   18              1508980591045.l027p6mt               1        359
42612   Wed    7               1510155616668.57i2wj1               1        149
42149   Wed    3               1510140439620.qu19kyo               1         69
46707 Thurs   22 1510296404412.xasqfwqd10v1qdtl6jemi               1        146
40122  Tues   11              1510082622485.szj2ja1e               1        147
57449   Mon   11              1511204933263.mq9bvi0d               1        119
> glimpse(sam)
Observations: 100
Variables: 5
$ Day             <ord> Wed, Wed, Wed, Thurs, Tues, Mon, Tues, Fri, Mon, Mon, Wed, Mon, Tues, Tues, Fri, Sun, Wed, M...
$ Hour            <int> 18, 7, 3, 22, 11, 11, 9, 16, 16, 13, 18, 18, 10, 19, 7, 13, 18, 14, 10, 20, 17, 6, 21, 15, 1...
$ sessionID       <chr> "1508980591045.l027p6mt", "1510155616668.57i2wj1", "1510140439620.qu19kyo", "1510296404412.x...
$ uniquePageviews <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ timeOnPage      <dbl> 359, 149, 69, 146, 147, 119, 168, 69, 29, 0, 1542, 148, 242, 49, 457, 175, 175, 97, 79, 12, ...
>头部(sam)
日时会话ID uniquePageviews timeOnPage
20219星期三18 1508980591045.l027p6mt 1 359
42612星期三7 1510155616668.57i2wj1 1 149
42149星期三1510140439620.qu19kyo 1 69
46707星期四22 1510296404412.xasqfwqd10v1qdtl6jemi 1 146
40122星期二11 1510082622485.szj2ja1e 1 147
57449周一11 1511204933263.mq9bvi0d 1 119
>一瞥(山姆)
意见:100
变量:5
星期三,星期三,星期三,星期四,星期二,星期一,星期二,星期五,星期一,星期一,星期三,星期一,星期二,星期五,星期日,星期三,星期三。。。
$Hour 18,7,3,22,11,11,9,16,16,13,18,18,10,19,7,13,18,14,10,20,17,6,21,15,1。。。
$sessionID“1508980591045.l027p6mt”、“1510155616668.57i2wj1”、“1510140439620.qu19kyo”、“1510296404412.x…”。。。
$uniquePageviews 1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,。。。
$timeOnPage 359、149、69、146、147、119、168、69、29、0、1542、148、242、49、457、175、175、97、79、12。。。
Metric uniquePageviews将始终为1或o,并且在热图中它看起来不太好。由于它是会话级别的数据,因此每天/小时有多个条目。对于timeOnPage,我希望热图显示给定小时和星期几组合的页面上的平均时间

所以,据我所知,ggplot是对所有内容求和,而我想要的是mean()

我的初始代码块:

# creates the initial heatmap
p <- ggplot(sam, aes(x = Day, y = Hour, fill = uniquePageviews)) +
  geom_tile(color = "white", size = 0.1) + 
  scale_fill_viridis(name = "TimeOnPage", option ="C")

# order by hour of day going top to bottom asc
p <-p + scale_y_continuous(trans = "reverse", breaks = unique(df$hour))
#创建初始热图

p不确定我是否了解您的问题,但您可以尝试以下方法:

library(tidyverse)
library(viridis)
d %>% 
  group_by(Day, Hour) %>% 
  summarise(Mean=mean(timeOnPage)) %>% 
  ggplot(aes(x = Day, y = Hour, fill = Mean)) +
  geom_tile(color = "white", size = 0.1) + 
  scale_fill_viridis(name = "TimeOnPage", option ="C")

这将计算每天和每小时的平均
时间,并将其绘制为热图。

不确定我是否了解您的问题,但您可以尝试以下操作:

library(tidyverse)
library(viridis)
d %>% 
  group_by(Day, Hour) %>% 
  summarise(Mean=mean(timeOnPage)) %>% 
  ggplot(aes(x = Day, y = Hour, fill = Mean)) +
  geom_tile(color = "white", size = 0.1) + 
  scale_fill_viridis(name = "TimeOnPage", option ="C")

这将计算每天和每小时的平均
timeOnPage
,并将其绘制为热图。

fill=timeOnPage
?可以是uniquePageviews,也可以是timeOnPage。它们都是潜在的指标。事后看来,我应该只包含一个指标,以使问题最小化。或者,或者,同样的问题,我想知道是否有一种方法可以在没有mani的情况下进行分组在传递到GGPLOT之前填充数据帧可能重复的
fill=timeOnPage
?uniquePageviews或timeOnPage。它们都是潜在的指标。事后看来,我应该只包含一个指标,以使问题最小化。或者,或者,相同的问题,我想知道是否有一种方法可以在pas之前分组而不操纵数据帧唱给ggplot可能重复的是的,这很有效。我感兴趣的是ggplot是否可以处理分组,而不是分组+总结。我认为可以,但我想这样做更简单是的,这很有效。我感兴趣的是ggplot是否可以处理分组,而不是分组+总结。我想可以,但我这样做比较简单
# gets the initial heatmap
p <- ggplot(sam, aes(x = Day, y = Hour, fill = uniquePageviews),
            stat = "summary", fun.y = "mean") +
  geom_tile(color = "white", size = 0.1) + 
  scale_fill_viridis(name = "Mean TimeOnPage", option ="C")

# order by hour of day going top to bottom asc
p <-p + scale_y_continuous(trans = "reverse", breaks = unique(df$hour))
library(tidyverse)
library(viridis)
d %>% 
  group_by(Day, Hour) %>% 
  summarise(Mean=mean(timeOnPage)) %>% 
  ggplot(aes(x = Day, y = Hour, fill = Mean)) +
  geom_tile(color = "white", size = 0.1) + 
  scale_fill_viridis(name = "TimeOnPage", option ="C")