如何通过R中的多个嵌套块汇总数据帧列中的唯一值_R_Datetime_Nested_Plyr

如何通过R中的多个嵌套块汇总数据帧列中的唯一值

r datetime

如何通过R中的多个嵌套块汇总数据帧列中的唯一值,r,datetime,nested,plyr,R,Datetime,Nested,Plyr,我有一个dataframe，其中包含datetime、id、time和depth列。我使用ddply获得每个唯一datetime的平均时间和深度，因为存在重复的datetime行。但是，在每个datetime块中，可能有多个唯一的“id”值，并且每个id都有重复的行。因此，我需要为每个datetime块实际计算从id块中获取的所有唯一时间的平均值。i、我首先需要从每个id块中获取唯一的时间值，然后我想计算使用此方法返回的每个datetime块的所有唯一时间值的平均值。我正试图使用%>%来实现这

我有一个dataframe，其中包含datetime、id、time和depth列。我使用ddply获得每个唯一datetime的平均时间和深度，因为存在重复的datetime行。但是，在每个datetime块中，可能有多个唯一的“id”值，并且每个id都有重复的行。因此，我需要为每个datetime块实际计算从id块中获取的所有唯一时间的平均值。i、我首先需要从每个id块中获取唯一的时间值，然后我想计算使用此方法返回的每个datetime块的所有唯一时间值的平均值。我正试图使用%>%来实现这一点，但这对我来说是一种新的语法，我正在努力。对于ddply包装中有关datetime的任何帮助或其他建议，我们将不胜感激。我在下面提供一个例子

> dput(df3)
structure(list(datetime = c("23/03/2017 14:13:45", "23/03/2017 14:13:45", 
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:13:45", 
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:13:45", 
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:15:15", 
"23/03/2017 14:15:15", "23/03/2017 14:15:15", "23/03/2017 14:15:15", 
"23/03/2017 14:15:45", "23/03/2017 14:15:45", "23/03/2017 14:16:15", 
"23/03/2017 14:16:15", "23/03/2017 14:16:15", "23/03/2017 14:16:15", 
"23/03/2017 14:16:15", "23/03/2017 14:16:15", "23/03/2017 14:16:15"
), id = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 
12L, 12L, 13L, 14L, 14L, 15L, 16L, 16L, 16L, 17L, 18L, 18L), 
    time = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
    3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 2L), dep = c(0.448675132, 
    0.448675132, 0.448675132, 0.448675132, 0.448675132, 0.448675132, 
    0.448675132, 0.448675132, 0.448675132, 0.448675132, 0.285520539, 
    0.285520539, 0.285520539, 0.285520539, 0.316112025, 0.316112025, 
    0.326309187, 0.356900674, 0.356900674, 0.356900674, 0.38749216, 
    0.326309187, 0.326309187)), class = "data.frame", row.names = c(NA, 
-23L))

我的尝试无效：

#convert datetime to POSIXct
df3$datetime = as.POSIXct(strptime(df3$datetime, format="%d/%m/%Y %H:%M:%S"), tz="UTC")

#Now condense the dateframe by unique datetime summarising tim and dep cols
  dfCondensed = ddply(df3, .(datetime), summarise,
                      #get the mean time for each unique datetime, but calculate this using 
                      #all the unique time values found within each unique id 
                      meantime = group_by(id) %>% unique(time) %>% mean(),
                      #do the same as above but for dep
                      meandep = group_by(id) %>% unique(dep) %>% mean())

期望输出

下面是一个

数据表方法
library(data.table)
setDT(df3)
unique(df3, by = c("datetime", "id"))[, .(mean.time = mean(time),
                                          mean.dep = mean(dep)), 
                                      by = .(datetime)][]

              datetime mean.time  mean.dep
1: 23/03/2017 14:13:45     10.00 0.4486751
2: 23/03/2017 14:15:15      2.00 0.2855205
3: 23/03/2017 14:15:45      2.00 0.3161120
4: 23/03/2017 14:16:15      1.75 0.3492528

下面是一个数据表方法
library(data.table)
setDT(df3)
unique(df3, by = c("datetime", "id"))[, .(mean.time = mean(time),
                                          mean.dep = mean(dep)), 
                                      by = .(datetime)][]

              datetime mean.time  mean.dep
1: 23/03/2017 14:13:45     10.00 0.4486751
2: 23/03/2017 14:15:15      2.00 0.2855205
3: 23/03/2017 14:15:45      2.00 0.3161120
4: 23/03/2017 14:16:15      1.75 0.3492528

我认为您正在寻找：
library(dplyr)

df3 %>%
   distinct() %>%
   group_by(datetime) %>%
   summarise(dep = mean(dep), mean = mean(time))

#  datetime              dep  mean
#  <chr>               <dbl> <dbl>
#1 23/03/2017 14:13:45 0.449 10   
#2 23/03/2017 14:15:15 0.286  2   
#3 23/03/2017 14:15:45 0.316  2   
#4 23/03/2017 14:16:15 0.349  1.75

库（dplyr）
df3%>%
不同的（）%>%
分组依据（日期时间）%>%
总结（dep=平均值（dep），平均值=平均值（时间））
#日期时间差平均值
#                  
#1 23/03/2017 14:13:45 0.449 10   
#2 23/03/2017 14:15:15 0.286  2   
#3 23/03/2017 14:15:45 0.316  2   
#4 23/03/2017 14:16:15 0.349  1.75
我想您正在寻找：
library(dplyr)

df3 %>%
   distinct() %>%
   group_by(datetime) %>%
   summarise(dep = mean(dep), mean = mean(time))

#  datetime              dep  mean
#  <chr>               <dbl> <dbl>
#1 23/03/2017 14:13:45 0.449 10   
#2 23/03/2017 14:15:15 0.286  2   
#3 23/03/2017 14:15:45 0.316  2   
#4 23/03/2017 14:16:15 0.349  1.75

库（dplyr）
df3%>%
不同的（）%>%
分组依据（日期时间）%>%
总结（dep=平均值（dep），平均值=平均值（时间））
#日期时间差平均值
#                  
#1 23/03/2017 14:13:45 0.449 10   
#2 23/03/2017 14:15:15 0.286  2   
#3 23/03/2017 14:15:45 0.316  2   
#4 23/03/2017 14:16:15 0.349  1.75
我们可以使用基本R

df4 <- unique(df3)
by(df4[c('time', 'dep')], df4[c('datetime')], FUN = colMeans)

aggregate(cbind(time, dep) ~ datetime, df4, mean)
#     datetime  time       dep
#1 23/03/2017 14:13:45 10.00 0.4486751
#2 23/03/2017 14:15:15  2.00 0.2855205
#3 23/03/2017 14:15:45  2.00 0.3161120
#4 23/03/2017 14:16:15  1.75 0.3492528

我们可以使用base R

df4 <- unique(df3)
by(df4[c('time', 'dep')], df4[c('datetime')], FUN = colMeans)

aggregate(cbind(time, dep) ~ datetime, df4, mean)
#     datetime  time       dep
#1 23/03/2017 14:13:45 10.00 0.4486751
#2 23/03/2017 14:15:15  2.00 0.2855205
#3 23/03/2017 14:15:45  2.00 0.3161120
#4 23/03/2017 14:16:15  1.75 0.3492528

谢谢，这确实给了我一个输出，但我需要的是代码，以实现我想要在ddply包装器中完成的操作，如上所述，因为我已经在这个函数中以不同的方式总结了许多其他列。我只需要一行代码（或者两行-一行用于TIME，一行用于meandep），返回ddply包装中已经应用于datetime块的方法。因此，我不能在datetime上再次使用组id。@jjulipplyr
is。您介意切换到dplyr吗？你可以用它做所有的总结和其他任务。是的，我已经更新了。谢谢但我仍然不明白如何让它按照我的要求工作。我遇到的问题（这里我不能给出一个例子）是，除了时间和深度之外，我还有一个包含许多其他列的大型数据帧，因此我认为当我键入distinct（）时，您的代码正在做一些其他事情，而不是使它特定于我的“id”分组，这正是我所需要的。你能告诉我怎么解决这个问题吗？@jjulip我明白了，在这种情况下，你能不能尝试df3%>%distinct（datetime，id）%%>%groupby（datetime）%%>%summary（dep=mean（dep），mean=mean（time））
只为datetime
和id
提供唯一的行。当我尝试时，它找不到我的对象（即dep）从df3内部。我刚刚收到一条错误消息，告诉我“找不到对象'dep'。谢谢，这确实给了我一个输出，但我需要的是代码，以实现我想要在ddply包装器中完成的操作，如上所述，因为我已经在这个函数中以不同的方式总结了一系列其他列。我只需要一行代码（或者两行-一行用于TIME，一行用于meandep），返回ddply包装中已经应用于datetime块的方法。因此，我不能在datetime上再次使用组id。@jjulipplyr
is。您介意切换到dplyr吗？你可以用它做所有的总结和其他任务。是的，我已经更新了。谢谢但我仍然不明白如何让它按照我的要求工作。我遇到的问题（这里我不能给出一个例子）是，除了时间和深度之外，我还有一个包含许多其他列的大型数据帧，因此我认为当我键入distinct（）时，您的代码正在做一些其他事情，而不是使它特定于我的“id”分组，这正是我所需要的。你能告诉我怎么解决这个问题吗？@jjulip我明白了，在这种情况下，你能不能尝试df3%>%distinct（datetime，id）%%>%groupby（datetime）%%>%summary（dep=mean（dep），mean=mean（time））
只为datetime
和id
提供唯一的行。当我尝试时，它找不到我的对象（即dep）从df3内部。我刚刚收到一条错误消息，告诉我“找不到对象'dep'。不幸的是，这对我没有帮助，因为我的数据框中有大量列，我想同时汇总多个列的数据，而不是简单地计算两列的平均值，这就是为什么我对datetime块使用ddply函数。我还需要从每个id块中获取唯一的值。不幸的是，这对我没有帮助，因为我的dataframe中有大量的列，我想同时汇总来自多个列的数据，而不是简单地计算两列的平均值，这就是为什么我对datetime块使用ddply函数。我还需要从每个id块中获取唯一的值。