`purrr `根据复杂规则集确定事件日期的行函数的替代方法_R_Function_Date_Purrr_Rowwise

`purrr `根据复杂规则集确定事件日期的行函数的替代方法

r function date

`purrr `根据复杂规则集确定事件日期的行函数的替代方法,r,function,date,purrr,rowwise,R,Function,Date,Purrr,Rowwise,我正在与一个客户合作，该客户希望提供一个输入电子表格，其中包含特定事件在给定年份发生的文本描述。每个事件（至少有200个）都是一个单独的行，包含一个关于事件发生时间的复杂规则集，例如，“10月1日之前的第一个星期六”或“最接近12月1日的星期五”。也有几次事件仅仅发生在特定日期，但这种情况很少。然而，实际的电子表格有大约15列控制每个事件的开始日期，因此我需要使用的逻辑来计算开始日期，这是非常深入的我想出了一种方法来计算开始日期，它使用一个函数和一个循环来迭代我的data.frame的每一行，

我正在与一个客户合作，该客户希望提供一个输入电子表格，其中包含特定事件在给定年份发生的文本描述。每个事件（至少有200个）都是一个单独的行，包含一个关于事件发生时间的复杂规则集，例如，“10月1日之前的第一个星期六”或“最接近12月1日的星期五”。也有几次事件仅仅发生在特定日期，但这种情况很少。然而，实际的电子表格有大约15列控制每个事件的开始日期，因此我需要使用的逻辑来计算开始日期，这是非常深入的

我想出了一种方法来计算开始日期，它使用一个函数和一个循环来迭代我的

data.frame

的每一行，但我想知道是否有一种更有效的

tidyverse

或

purr

解决这个问题的方法。是否有可能（或建议）将此问题的解决方案矢量化

这是我目前（有效）的解决方案，是我能想象到的最小、最紧凑的例子。我能让这个更有效，更可读，更复杂的现实世界的输入吗

library(tidyverse)
library(lubridate)

# Bring in demo data that describes 3 events, and when they should each start.

demo <- structure(list(Event = c("Gala", "Celebration", "Wrap-up"), date_start
= structure(c(18871, NA, NA), class = "Date"), weekday_near = c(NA,
"Saturday", "Friday" ), near_description = c(NA, "before", "closest to"),
near_date = structure(c(NA, 18901, 18962), class = "Date")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))

现在，确定每个活动的开始日期-庆典、庆祝和总结

# Create a tibble that contains all possible dates for these events this year.

datedb <- tibble(date = seq(make_date(2021, 9, 1), make_date(2021, 12, 31), by = 1),
                 wday = wday(date, label = TRUE, abbr = FALSE))


# Write function meant to determine event date for each row of the dataframe.

determine_date <- function(df){
  
  # define variables that are easier to read
  # this part makes me squeamish - 
  # there's gotta be a better way to do this with the tidyverse
  event_date_exact <- df[["date_start"]]
  event_near_wday <- df[["weekday_near"]]
  event_near_desc <- df[["near_description"]]
  event_near_date <- df[["near_date"]]
  
  # Event date - if there is an exact date for the event, choose it as the event date.
  if (!is.na(event_date_exact)) {
    event_date <- event_date_exact
  
  # Otherwise, if the date is dependent on another date, figure out when it should be:
  } else {
    event_date_vec <- datedb %>% filter(wday == event_near_wday) %>% pull(date)
    event_date <- 
      case_when(
        # If you're looking for the closest weekday to a particular date:
        event_near_desc == "closest to" ~ event_date_vec[which(abs(event_date_vec - event_near_date) == 
                         min(abs(event_date_vec - event_near_date), na.rm = TRUE))],
        # If you're looking for the first weekday before that weekday
        event_near_desc == "before" ~ rev(event_date_vec[which(event_date_vec - event_near_date < 0)])[1],
        # If neither of these worked, output NA to check why 
        TRUE ~ NA_Date_
      )
       }
}

# create empty vector to store results
start_dates <- lubridate::ymd()

for (i in 1:nrow(demo)) {
  start_dates[i] <- determine_date(demo[i,])
}

# add start dates back to original demo dataframe
demo$start_date <- start_dates

如果你想对一个函数进行矢量化，它实际上只是调用

mapply

。因此，如果您想使用

purrr

样式编码，您可能只想修改函数参数，如下所示：

设置：

库（tidyverse）
图书馆（lubridate）
#> 
#>附加包装：“lubridate”
#>以下对象已从“package:base”屏蔽：
#> 
#>日期、相交、设置差异、联合
#引入描述3个事件的演示数据，以及每次事件的开始时间。
演示嗨，下面将摆脱循环<代码>演示%>%rowwise（）%%>%mutate（开始日期=确定日期（.data））

。我个人不知道

rowwise

是否被认为是矢量化的，但我发现它比尝试使用

purr

进行行操作更容易。我使用

purr

rowwise的唯一方法是首先使用

split

逐行分割数据。感谢Justin，我一直在读到

rowwise

可能会被弃用，所以我尽量谨慎，不要使用它，因为我希望代码在未来的许多年中都能正常工作！非常感谢贾斯汀！我有几个问题/意见。当出现时，这是否仍适用于案例_？在我的真实示例中，我有两个以上的条件，其中您有

开关

。此外，在我的真实示例中，电子表格既有开始日期，也有结束日期。它们有不同的列名，但在同一行中。我通过给函数一个参数“开始日期”或“结束日期”，然后定义函数中的列（如我的示例所示）来解决这个问题。您将如何处理这个问题？我想一个想法是重新塑造（

pivot\u longer

）表，使其有一列指示它是“开始日期”还是“结束日期”，然后使列名称相同，无论它是开始日期还是结束日期。同样，在

情况下，它应该可以正常工作，我写这篇文章的目的非常明确，就是矢量化。至于开始日期/结束日期，没有一个例子我不确定。如果你修改你的帖子，我可能会帮上更大的忙。我将先讨论一些想法。再次感谢！回答：贾斯汀，你的解决方案应该对我很有效，我从你建议的解决方案中学到了很多。如果您对pmap剥离日期类的问题感兴趣，可以查看。贾斯汀：虽然这个问题没有得到太多的关注，但如果你认为它值得投票，我会非常感激！
# Create a tibble that contains all possible dates for these events this year.

datedb <- tibble(date = seq(make_date(2021, 9, 1), make_date(2021, 12, 31), by = 1),
                 wday = wday(date, label = TRUE, abbr = FALSE))


# Write function meant to determine event date for each row of the dataframe.

determine_date <- function(df){
  
  # define variables that are easier to read
  # this part makes me squeamish - 
  # there's gotta be a better way to do this with the tidyverse
  event_date_exact <- df[["date_start"]]
  event_near_wday <- df[["weekday_near"]]
  event_near_desc <- df[["near_description"]]
  event_near_date <- df[["near_date"]]
  
  # Event date - if there is an exact date for the event, choose it as the event date.
  if (!is.na(event_date_exact)) {
    event_date <- event_date_exact
  
  # Otherwise, if the date is dependent on another date, figure out when it should be:
  } else {
    event_date_vec <- datedb %>% filter(wday == event_near_wday) %>% pull(date)
    event_date <- 
      case_when(
        # If you're looking for the closest weekday to a particular date:
        event_near_desc == "closest to" ~ event_date_vec[which(abs(event_date_vec - event_near_date) == 
                         min(abs(event_date_vec - event_near_date), na.rm = TRUE))],
        # If you're looking for the first weekday before that weekday
        event_near_desc == "before" ~ rev(event_date_vec[which(event_date_vec - event_near_date < 0)])[1],
        # If neither of these worked, output NA to check why 
        TRUE ~ NA_Date_
      )
       }
}

# create empty vector to store results
start_dates <- lubridate::ymd()

for (i in 1:nrow(demo)) {
  start_dates[i] <- determine_date(demo[i,])
}

# add start dates back to original demo dataframe
demo$start_date <- start_dates

demo

Event       date_start weekday_near near_description near_date    start_date
Gala        2021-09-01 NA           NA               NA           2021-09-01
Celebration NA         Saturday     before           2021-10-01   2021-09-25
Wrap-up     NA         Friday       closest to       2021-12-01   2021-12-03