R 在时间序列的上下文中分解
我有一个数据集,我想通过几个不同的变量对其进行整体可视化和分类。我创建了一个flexdashboard,其中有一个toy Shining应用程序来选择分解类型,并使用工作代码来绘制正确的子集 我的方法是重复的,这对我来说是一个暗示,我错过了一个更好的方法来做到这一点。让我大吃一惊的是需要按日期计数并扩展矩阵。我不知道如何在一个管道中按周计算组数。我分几个步骤来做,然后结合起来 想法 顺便说一句,我问了这个问题,但我想这可能更像是一个问题。我没有权限将其从RSC中删除,因此对交叉发布表示歉意R 在时间序列的上下文中分解,r,shiny,dplyr,r-markdown,R,Shiny,Dplyr,R Markdown,我有一个数据集,我想通过几个不同的变量对其进行整体可视化和分类。我创建了一个flexdashboard,其中有一个toy Shining应用程序来选择分解类型,并使用工作代码来绘制正确的子集 我的方法是重复的,这对我来说是一个暗示,我错过了一个更好的方法来做到这一点。让我大吃一惊的是需要按日期计数并扩展矩阵。我不知道如何在一个管道中按周计算组数。我分几个步骤来做,然后结合起来 想法 顺便说一句,我问了这个问题,但我想这可能更像是一个问题。我没有权限将其从RSC中删除,因此对交叉发布表示歉意 --
---
title: "test"
output:
flexdashboard::flex_dashboard:
theme: bootstrap
runtime: shiny
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```
```{r global, include=FALSE}
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"),
as.Date("2018-06-30"),
"days"),
sex = sample(c("male", "female"), 181, replace=TRUE),
lang = sample(c("english", "spanish"), 181, replace=TRUE),
age = sample(20:35, 181, replace=TRUE))
dat <- sample_n(dat, 80)
```
Sidebar {.sidebar}
=====================================
```{r}
radioButtons("diss", label = "Disaggregation",
choices = list("All" = 1, "By Sex" = 2, "By Language" = 3),
selected = 1)
```
Page 1
=====================================
```{r}
# all
all <- reactive(
dat %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>% # convert to tibble time object
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total = 0))
)
# males only
males <- reactive(
dat %>%
filter(sex=="male") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_m = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_m = 0))
)
# females only
females <- reactive(
dat %>%
filter(sex=="female") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_f = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_f = 0))
)
# english only
english <- reactive(
dat %>%
filter(lang=="english") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_e = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_e = 0))
)
# spanish only
spanish <- reactive(
dat %>%
filter(lang=="spanish") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_s = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_s = 0))
)
# combine
totals <- reactive({
all <- all()
females <- females()
males <- males()
english <- english()
spanish <- spanish()
all %>%
select(date, total) %>%
full_join(select(females, date, total_f), by = "date") %>%
full_join(select(males, date, total_m), by = "date") %>%
full_join(select(english, date, total_e), by = "date") %>%
full_join(select(spanish, date, total_s), by = "date")
})
# convert to xts
totals_ <- reactive({
totals <- totals()
xts(totals, order.by = totals$date)
})
# plot
renderDygraph({
totals_ <- totals_()
if (input$diss == 1) {
dygraph(totals_[, "total"],
main= "All") %>%
dySeries("total", label = "All") %>%
dyRangeSelector() %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else if (input$diss == 2) {
dygraph(totals_[, c("total_f", "total_m")],
main = "By sex") %>%
dyRangeSelector() %>%
dySeries("total_f", label = "Female") %>%
dySeries("total_m", label = "Male") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else {
dygraph(totals_[, c("total_e", "total_s")],
main = "By language") %>%
dyRangeSelector() %>%
dySeries("total_e", label = "English") %>%
dySeries("total_s", label = "Spanish") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
}
})
```
更新:
@Jon Spring建议编写一个函数来减少下面应用的一些重复,这是一个很好的改进。然而,基本方法是相同的。分段、计算、合并、绘图。有没有一种方法可以做到这一点而不必拆开并重新组装
---
title: "test"
output:
flexdashboard::flex_dashboard:
theme: bootstrap
runtime: shiny
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```
```{r global, include=FALSE}
# generate data
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"),
as.Date("2018-06-30"),
"days"),
sex = sample(c("male", "female"), 181, replace=TRUE),
lang = sample(c("english", "spanish"), 181, replace=TRUE),
age = sample(20:35, 181, replace=TRUE))
dat <- sample_n(dat, 80)
# Jon Spring's function
prep_dat <- function(filtered_dat, col_name = "total") {
filtered_dat %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
select(date, new) %>%
tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
group_by(date) %>%
mutate(total = sum(new, na.rm = TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(
date = seq(date[1], date[length(date)], by = "1 week"),
fill = list(total = 0)
)
}
```
Sidebar {.sidebar}
=====================================
```{r}
radioButtons("diss", label = "Disaggregation",
choices = list("All" = 1, "By Sex" = 2, "By Language" = 3),
selected = 1)
```
Page 1
=====================================
```{r}
# all
all <- reactive(
prep_dat(dat)
)
# males only
males <- reactive(
prep_dat(
dat %>%
filter(sex == "male")
) %>%
rename("total_m" = "total")
)
# females only
females <- reactive(
prep_dat(
dat %>%
filter(sex == "female")
) %>%
rename("total_f" = "total")
)
# english only
english <- reactive(
prep_dat(
dat %>%
filter(lang == "english")
) %>%
rename("total_e" = "total")
)
# spanish only
spanish <- reactive(
prep_dat(
dat %>%
filter(lang == "spanish")
) %>%
rename("total_s" = "total")
)
# combine
totals <- reactive({
all <- all()
females <- females()
males <- males()
english <- english()
spanish <- spanish()
all %>%
select(date, total) %>%
full_join(select(females, date, total_f), by = "date") %>%
full_join(select(males, date, total_m), by = "date") %>%
full_join(select(english, date, total_e), by = "date") %>%
full_join(select(spanish, date, total_s), by = "date")
})
# convert to xts
totals_ <- reactive({
totals <- totals()
xts(totals, order.by = totals$date)
})
# plot
renderDygraph({
totals_ <- totals_()
if (input$diss == 1) {
dygraph(totals_[, "total"],
main= "All") %>%
dySeries("total", label = "All") %>%
dyRangeSelector() %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else if (input$diss == 2) {
dygraph(totals_[, c("total_f", "total_m")],
main = "By sex") %>%
dyRangeSelector() %>%
dySeries("total_f", label = "Female") %>%
dySeries("total_m", label = "Male") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else {
dygraph(totals_[, c("total_e", "total_s")],
main = "By language") %>%
dyRangeSelector() %>%
dySeries("total_e", label = "English") %>%
dySeries("total_s", label = "Spanish") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
}
})
```
这是创建函数的好地方,可以缩短代码并减少出错的可能性 更复杂的一点是,使用dplyr编程通常需要涉入一个名为tidyeval的框架,该框架非常强大,但可能很吓人。 这里有一种避开tidyeval的替代方法: 在您的场景中,完全可以通过在函数前后进行一些操作来避免这些挑战。它不那么优雅,但很管用 顺便说一句,我不能保证它会起作用,因为你没有共享一个可验证的reprex,例如,包括一个与你的表单相同的数据样本,但它使用的是我编造的假数据。见下图。抱歉,我错过了提供示例数据的区块
prep_dat <- function(filtered_dat, col_name = "total") {
filtered_dat %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
select(date, new) %>%
tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
group_by(date) %>%
mutate(total = sum(new, na.rm = TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(
date = seq(date[1], date[length(date)], by = "1 week"),
fill = list(total = 0)
)
}
然后,您可以使用过滤后的数据和总计列的名称来调用它。此片段应该能够替换您当前使用的~20行:
males <- prep_dat(dat_fake %>%
filter(sex == "male")) %>%
rename("total_m" = "total")
我测试的假数据:
dat_fake <- tibble(
date = as.Date("2018-01-01") + runif(500, 0, 100),
new = runif(500, 0, 100),
sex = sample(c("male", "female"),
500, replace = TRUE),
lang = sample(c("english", "french", "spanish", "portuguese", "tagalog"),
500, replace = TRUE)
)
我想你可以通过改变准备的顺序来获得一些收获。目前,您的应用程序的流量大约为: 数据=>准备所有组合=>选择所需的可视化=>绘制绘图 相反,请考虑: 数据=>选择所需的可视化=>准备所需的组合=>绘制绘图 这将利用Shiny的反应性来重新准备请求绘图所需的数据,以响应用户选择的变化 通过代码片段抱歉,我对flexdashboard和tibbletime不太熟悉,无法确保这些代码能够运行,但我希望这足以强调以下方法: 您的控件选择要关注的列注意:我们使用All='1',因此这将计算为group by中的一个常量,否则必须单独处理:
radioButtons("diss", label = "Disaggregation",
choices = list("All" = "'1'",
"By Sex" = "sex",
"By Language" = "lang",
"By other" = "column_name_of_'other'"),
selected = 1)
然后在您的小组中使用此选项,以便仅准备当前可视化所需的数据。您需要通过以下方式调整@Jon_Spring建议的功能,以响应前面的小组:
preped_dat = reactive({
dat %>%
group_by_(input$diss) %>%
# etc
})
打印前,您需要调整打印功能以响应数据格式中可能的更改:
renderDygraph({
totals = preped_data()
dygraph(totals) %>%
dySeries("total", label = ) %>%
dyRangeSelector()
})
关于group_by,如果所有参数都是文本字符串,您可以使用group_by,或者group_by!!SYMCINPUT$diss,other_column_name如果要将控件输入的文本字符串与其他列名混合使用
这种方法更改的一个可能缺点是,如果数据集很大,则在交互过程中响应能力会降低。目前的方法先进行所有计算,然后对每个选择进行最小计算——如果您有大量的处理,这可能更可取。我建议的方法将对每次选择进行最少的前期处理和适度的计算。感谢您对您的目标进行更多的解释。我认为@simon-s-a建议的方法将简化事情。如果我们可以动态地运行分组,并将其结构化,这样我们就不需要事先知道这些组中可能的组件,那么维护起来就会容易得多 下面是一个最小可行的产品,它重建了绘图功能,将分组逻辑包含在其中 一旦按日期分组,不管分组变量是什么,它都会计算每个组有多少行,然后将这些行分散开来,这样每个组都会得到一列 然后我使用padr::pad来填充中间任何丢失的时间行,并用零替换所有NA 最后,该数据帧被转换为一个xts对象,并输入到动态图中,动态图似乎可以自动处理多个列 在这里:
谢谢,琼斯普林。通读。不过还是有个简单的问题。全局块中提供的数据帧有什么问题?抱歉,我错过了!没问题。该功能有助于减少重复,这总是好的。感谢您花时间研究该方法。我将把它放在一边,看看大家是否有关于如何避免所有拆分和合并的想法。谢谢
花时间提出不同的策略。这很有道理。这是@simon-s-a建议的一个很好的实现。从这个答案中可以学到很多东西。
---
title: "test"
output:
flexdashboard::flex_dashboard:
theme: bootstrap
runtime: shiny
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```
```{r global, include=FALSE}
# generate data
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"),
as.Date("2018-06-30"),
"days"),
sex = sample(c("male", "female"), 181, replace=TRUE),
lang = sample(c("english", "spanish"), 181, replace=TRUE),
age = sample(20:35, 181, replace=TRUE))
dat <- dplyr::sample_n(dat, 80)
```
Sidebar {.sidebar}
=====================================
```{r}
radioButtons("diss", label = "Disaggregation",
choices = list("All" = "Total",
"By Sex" = "sex",
"By Language" = "lang"),
selected = "Total")
```
Page 1
=====================================
```{r plot}
renderDygraph({
grp_col <- rlang::sym(input$diss) # This converts the input selection to a symbol
dat %>%
mutate(Total = 1) %>% # This is a hack to let us "group" by Total -- all one group
# Here's where we unquote the symbol so that dplyr can use it
# to refer to a column. In this case I make a dummy column
# that's a copy of whatever column we want to group
mutate(my_group = !!grp_col) %>%
# Now we make a group for every existing combination of week
# (using lubridate::floor_date) and level of our grouping column,
# count how many rows in each group, and spread that to wide format.
group_by(date = lubridate::floor_date(date, "1 week"), my_group) %>%
count() %>% spread(my_group, n) %>% ungroup() %>%
# padr:pad() fills in any missing weeks in the sequence with new rows
# Then we replace all the NA's with zeroes.
padr::pad() %>% replace(is.na(.), 0) %>%
# Finally we can convert to xts and feed the wide table into digraph.
xts::xts(order.by = .$date) %>%
dygraph() %>%
dyRangeSelector() %>%
dyOptions(
useDataTimezone = FALSE, stepPlot = TRUE,
drawGrid = FALSE, fillGraph = TRUE
)
})
```