使用dplyr按组计算平均时间差

使用dplyr按组计算平均时间差,r,date,dplyr,R,Date,Dplyr,假设我有以下数据框,表示用户在不同公司注册应用程序的日期: df <- data.frame(user = c("Tia", "Sam", "Matt", "Brandy", "Joe", "Nariko"), company = c("Intel", "Intel", "Nvidia", "Nvidia", "Nvidia", "Google"), registrationDate = as.Date(c("2015-0

假设我有以下数据框,表示用户在不同公司注册应用程序的日期:

df <- data.frame(user = c("Tia", "Sam", "Matt", "Brandy", "Joe", "Nariko"),
                 company = c("Intel", "Intel", "Nvidia", "Nvidia", "Nvidia", "Google"),
                 registrationDate = as.Date(c("2015-01-04", "2015-01-04", "2015-01-19", 
                                              "2015-01-20", "2015-01-20", "2015-01-25")),
                 stringsAsFactors = FALSE)
我得到数据帧中每一行复制的整个registrationDate向量的最大日期。这就好像max函数忽略了dplyr的管道

另一个,使用总结而不是变异:

df %>% group_by(company) %>% 
  mutate(AvgTime = (max(registrationDate)-min(registrationDate))/length(company))

    user company registrationDate        AvgTime
1    Tia   Intel       2015-01-04 0.0000000 days
2    Sam   Intel       2015-01-04 0.0000000 days
3   Matt  Nvidia       2015-01-19 0.3333333 days
4 Brandy  Nvidia       2015-01-20 0.3333333 days
5    Joe  Nvidia       2015-01-20 0.3333333 days
6 Nariko  Google       2015-01-25 0.0000000 days

您能否显示预期的输出,因为您的描述和代码不清楚。是df%>%group\U bycompany%>%mutateAvgTime=Mean DifferentistrationDate抱歉,这不清楚。我想要最大时差除以每个公司的用户数。类似于difftimemaxdf$registrationDate、mindf$registrationDate/num_users@akrun出于某种原因,mutate函数的结果计算出每个公司的avgTime为4.2。但是,假设我们首先使用df2过滤df,可能df%>%group\U bycompany%>%mutatenew=MaxDifferRegistrationDate/LengthuniqueUsers,这就是我要找的!但是,在我的机器上运行您的线路,每行的平均时间为3.5?解决了这个问题。我不得不在mac上删除并重新安装dplyr。不知道那里发生了什么。
df %>% group_by(company) %>% 
  mutate(AvgTime = (max(registrationDate)-min(registrationDate))/length(company))

    user company registrationDate        AvgTime
1    Tia   Intel       2015-01-04 0.0000000 days
2    Sam   Intel       2015-01-04 0.0000000 days
3   Matt  Nvidia       2015-01-19 0.3333333 days
4 Brandy  Nvidia       2015-01-20 0.3333333 days
5    Joe  Nvidia       2015-01-20 0.3333333 days
6 Nariko  Google       2015-01-25 0.0000000 days
df2 = df %>% 
  group_by(company) %>%
  summarize(minDate = min(registrationDate), maxDate = max(registrationDate), num_users = n())

> df2
Source: local data frame [3 x 4]

   company    minDate    maxDate num_users
     (chr)     (date)     (date)     (int)
 1  Google 2015-01-25 2015-01-25         1
 2   Intel 2015-01-04 2015-01-04         2
 3  Nvidia 2015-01-19 2015-01-20         3

df2$result = difftime(df2$maxDate, df2$minDate, units = "days")/df2$num_users

> df2
Source: local data frame [3 x 5]

  company    minDate    maxDate num_users     result
    (chr)     (date)     (date)     (int)     (dfft)
1  Google 2015-01-25 2015-01-25         1     0 days
2   Intel 2015-01-04 2015-01-04         2     0 days
3  Nvidia 2015-01-19 2015-01-20         3 0.3333333 days