R 基于条件和日期的聚合_R_Date_Dplyr

R 基于条件和日期的聚合

r date

R 基于条件和日期的聚合,r,date,dplyr,R,Date,Dplyr,我的每日数据集如下所示： date CMA0013 CMA0047 CMA0052 CMA0067 1975-10-01 0 0.012 0.078 0 1975-10-02 0 0.012 0.078 0 1975-10-03 0 0.012 0.078 0 1975-10-04 0 0.012 0.078 0 1975-10-05 0 0

我的每日数据集如下所示：

date       CMA0013 CMA0047 CMA0052 CMA0067
1975-10-01       0   0.012   0.078       0
1975-10-02       0   0.012   0.078       0
1975-10-03       0   0.012   0.078       0
1975-10-04       0   0.012   0.078       0
1975-10-05       0   0.012   0.078       0
1975-10-06       0   0.012   0.078       0
...

在R中，我想按月和按年统计每个列中满足条件<0.001的记录数。让我们假设得到如下结果：

month   year    CMA0013   CMA0047   CMA0052   CMA0067
   10   1975          6         0         0         6
   11   1975        ...

我尝试了使用聚合函数和ddply函数的不同选项，但是，由于我对它们的了解还不是很深入，所以我无法得到任何令人满意的解决方案。谢谢大家对你的帮助

一个不适用于ddply的示例

它不能正确求和，只对一列CMA0010进行求和。请尝试使用lubridate软件包和dplyr：

   sum_df <- daily %>%
      mutate(month = lubridate::month(date),
               year= lubridate::year(date)) %>%
      group_by(year, month) %>%
      summarise(CMA0013 = sum(CMA0013 < 0.001),
                #The rest of you sums...
                )

这里有一个方法

library(lubridate) #to extract the year and month
df$year <- year(df$date)
df$month <- month(df$date)
df2 <- aggregate(df[, grep("CMA", names(df))], #just summarise columns starting "CMA"
                 by = list(year=df$year, month=df$month), 
                 function(x) sum(x<0.001))

df2
  year month CMA0013 CMA0047 CMA0052 CMA0067
1 1975    10       6       0       0       6

dplyr和lubridate解决方案，但会自动计算所有CMA列的总和

library(dplyr)
library(lubridate)
library(tidyr)
d %>%
    gather(key, value, -date) %>%
    mutate(year = year(date), month = month(date)) %>%
    select(-date) %>%
    group_by(year, month, key) %>%
    summarize(N = sum(value < 0.001)) %>%
    spread(key, N)

# A tibble: 1 x 6
# Groups:   year, month [1]
   year month CMA0013 CMA0047 CMA0052 CMA0067
* <dbl> <dbl>   <int>   <int>   <int>   <int>
1  1975    10       6       0       0       6

欢迎来到SO。请展示您已经尝试过的内容和不适合您的内容，以便用户可以看到您方面的一些研究成果。感谢您的评论和建议。我做了一系列的研究，一整天的时间是否足够？并试图根据我的问题调整其他相关帖子的解决方案，但没有找到一个令人满意的解决方案。我接受你的建议，并将在以后的帖子中展示我的尝试。Cheeset真的很难说什么是足够的，这一直是广泛元讨论的主题。然而，这样做的目的是防止人们来到这里，不费吹灰之力就寻求完整的解决方案。如果你觉得你的网络搜索引擎研究已经筋疲力尽了，那么这个地方也是如此。这就是为什么在这个问题上会有一些具体的尝试，我完全理解。事实上，在经历了一场巨大的斗争后，发帖是我的终极资源。我编辑了我的帖子来展示我的一次不满意的尝试。不管怎样，我很感激你的建议是的，谢谢，干得好！谢谢，我不熟悉大多数语法，尤其是%>%，但我会尝试从中学习

library(dplyr)
library(lubridate)
library(tidyr)
d %>%
    gather(key, value, -date) %>%
    mutate(year = year(date), month = month(date)) %>%
    select(-date) %>%
    group_by(year, month, key) %>%
    summarize(N = sum(value < 0.001)) %>%
    spread(key, N)

# A tibble: 1 x 6
# Groups:   year, month [1]
   year month CMA0013 CMA0047 CMA0052 CMA0067
* <dbl> <dbl>   <int>   <int>   <int>   <int>
1  1975    10       6       0       0       6