R 在有关索引的条件下查找最小值（可能的NA输出）_R_Dplyr_Tidyr_Zoo

R 在有关索引的条件下查找最小值（可能的NA输出）

R 在有关索引的条件下查找最小值（可能的NA输出）,r,dplyr,tidyr,zoo,R,Dplyr,Tidyr,Zoo,问题: 我使用dplyr在R中进行数据分析，遇到以下问题我的数据框如下所示： item day val 1 A 1 90 2 A 2 100 3 A 3 110 4 A 5 80 5 A 8 70 6 B 1 75 7 B 3 65 数据框已按项目“天”排列。现在我想修改一个新列，每一行都是同一组的最小值，并且日期在接下来的2天之内对于上面的示例，我

问题:

我使用dplyr在R中进行数据分析，遇到以下问题

我的数据框如下所示：

   item  day  val 
1     A    1   90 
2     A    2  100 
3     A    3  110 
4     A    5   80 
5     A    8   70
6     B    1   75
7     B    3   65

数据框已按项目“天”排列。现在我想修改一个新列，每一行都是同一组的最小值，并且日期在接下来的2天之内

对于上面的示例，我希望生成的数据帧为：

   item  day  val  output
1     A    1   90     100  # the smaller of 100 and 110
2     A    2  100     110  # the only value within 2 days
3     A    3  110      80  # the only value within 2 days
4     A    5   80      NA  # there is no data within 2 days
5     A    8   70      NA  # there is no data within 2 days
6     B    1   75      65  # the only value within 2 days
7     B    3   65      NA  # there is no data within 2 days

我知道我可能会使用group_by和mutate，但是如何编写内部函数以实现我想要的结果呢

非常感谢您的帮助。如果你需要我澄清什么，请告诉我。谢谢大家!

试试这个：

df %>%

  # arrange(item, day) %>% # if not already arranged

  # take note of the next two values & corresponding difference in days
  group_by(item) %>%
  mutate(val.1 = lead(val),
         day.1 = lead(day) - day,
         val.2 = lead(val, 2),
         day.2 = lead(day, 2) - day) %>%
  ungroup() %>%

  # if the value is associated with a day more than 2 days away, change it to NA
  mutate(val.1 = ifelse(day.1 %in% c(1, 2), val.1, NA),
         val.2 = ifelse(day.2 %in% c(1, 2), val.2, NA)) %>%

  # calculate output normally
  group_by(item, day) %>%
  mutate(output = min(val.1, val.2, na.rm = TRUE)) %>%
  ungroup() %>%

  # arrange results
  select(item, day, val, output) %>%
  mutate(output = ifelse(output == Inf, NA, output)) %>%
  arrange(item, day)

# A tibble: 7 x 4
  item     day   val output
  <fctr> <int> <int>  <dbl>
1 A          1    90  100  
2 A          2   100  110  
3 A          3   110   80.0
4 A          5    80   NA  
5 A          8    70   NA  
6 B          1    75   65.0
7 B          3    65   NA

数据：

我们可以使用包中的complete按天完成数据集，然后使用lead from和rollapply from查找接下来两天的最短时间

library(dplyr)
library(tidyr)
library(zoo)

DF2 <- DF %>%
  group_by(item) %>%
  complete(day = full_seq(day, period = 1)) %>%
  mutate(output = rollapply(lead(val), width = 2, FUN = min, na.rm = TRUE, 
                            fill = NA, align = "left")) %>%
  drop_na(val) %>%
  ungroup() %>%
  mutate(output = ifelse(output == Inf, NA, output))
DF2
# # A tibble: 7 x 4
#   item    day   val output
#   <chr> <dbl> <int>  <dbl>
# 1 A      1.00    90  100  
# 2 A      2.00   100  110  
# 3 A      3.00   110   80.0
# 4 A      5.00    80   NA  
# 5 A      8.00    70   NA  
# 6 B      1.00    75   65.0
# 7 B      3.00    65   NA

资料

我们将创建一个带有modified day的数据集，这样我们就可以在原始数据集上加入它，只保留最小值

df %>%
  left_join(
    bind_rows(mutate(.,day=day-1),mutate(.,day=day-2)) %>% rename(output=val)) %>%  
  group_by(item,day,val) %>%
  summarize_at("output",min) %>%
  ungroup

# # A tibble: 7 x 4
#     item   day   val output
#   <fctr> <dbl> <int>  <dbl>
# 1      A     1    90    100
# 2      A     2   100    110
# 3      A     3   110     80
# 4      A     5    80     NA
# 5      A     8    70     NA
# 6      B     1    75     65
# 7      B     3    65     NA

资料

请注意，rollapply可以写为rollapplyval，width=list1:2，FUN=min，na.rm=TRUE，fill=na，其中width=list1:2表示使用前面的元素1和2位置。@G.Grothendieck感谢分享这一伟大的技巧。

DF <- read.table(text = "item  day  val 
1     A    1   90 
                 2     A    2  100 
                 3     A    3  110 
                 4     A    5   80 
                 5     A    8   70
                 6     B    1   75
                 7     B    3   65",
                  header = TRUE, stringsAsFactors = FALSE)

df %>%
  left_join(
    bind_rows(mutate(.,day=day-1),mutate(.,day=day-2)) %>% rename(output=val)) %>%  
  group_by(item,day,val) %>%
  summarize_at("output",min) %>%
  ungroup

# # A tibble: 7 x 4
#     item   day   val output
#   <fctr> <dbl> <int>  <dbl>
# 1      A     1    90    100
# 2      A     2   100    110
# 3      A     3   110     80
# 4      A     5    80     NA
# 5      A     8    70     NA
# 6      B     1    75     65
# 7      B     3    65     NA

df <- read.table(text = "   item  day  val 
1     A    1   90 
2     A    2  100 
3     A    3  110 
4     A    5   80 
5     A    8   70
6     B    1   75
7     B    3   65", header = TRUE)