使用dplyr对表的每一行使用自定义函数时出现警告？_R_Dplyr

使用dplyr对表的每一行使用自定义函数时出现警告？

使用dplyr对表的每一行使用自定义函数时出现警告？,r,dplyr,R,Dplyr,我试图用一个自定义函数复制类似的东西，但我得到了错误。我有以下数据框 > dd datetimeofdeath injurydatetime 1 2/10/05 17:30 2 2/13/05 19:15 3 2/15/05 1:10 4 2/24/05 21:00 2/16/05 20:36 5 3/11/05 0:45 6

我试图用一个自定义函数复制类似的东西，但我得到了错误。我有以下数据框

> dd
   datetimeofdeath injurydatetime
1                   2/10/05 17:30
2                   2/13/05 19:15
3                    2/15/05 1:10
4    2/24/05 21:00  2/16/05 20:36
5                    3/11/05 0:45
6                   3/19/05 23:05
7                   3/19/05 23:13
8                   3/23/05 20:51
9                   3/31/05 11:30
10                    4/9/05 3:07

它们的类型是整数，但出于某种原因，它们的级别就好像它们是因子一样。这可能是我问题的根源，但我不确定

> typeof(dd$datetimeofdeath)
[1] "integer"
> typeof(dd$injurydatetime)
[1] "integer"
> dd$injurydatetime
 [1] 2/10/05 17:30 2/13/05 19:15 2/15/05 1:10  2/16/05 20:36 3/11/05 0:45  3/19/05 23:05 3/19/05 23:13 3/23/05 20:51 3/31/05 11:30
[10] 4/9/05 3:07  
549 Levels:  1/1/07 18:52 1/1/07 20:51 1/1/08 17:55 1/1/11 15:25 1/1/12 0:22 1/1/12 22:58 1/11/06 23:50 1/11/07 6:26 ... 9/9/10 8:15

现在我想按行应用以下函数

library(lubridate)
library(dplyr)
get_time_alive = function(datetimeofdeath, injurydatetime)
{
  if(as.character(datetimeofdeath) == "" | as.character(injurydatetime) == "") return(NA)

  time_of_death = parse_date_time(as.character(datetimeofdeath), "%m/%d/%y %H:%M")
  time_of_injury = parse_date_time(as.character(injurydatetime), "%m/%d/%y %H:%M")

  time_alive = as.duration(new_interval(time_of_injury,time_of_death))
  time_alive_hours = as.numeric(time_alive) / (60*60)

  return(time_alive_hours)
}

这适用于单个行，但在按行操作时不起作用

> get_time_alive(dd$datetimeofdeath[1], dd$injurydatetime[1])
[1] NA
> get_time_alive(dd$datetimeofdeath[4], dd$injurydatetime[4])
[1] 192.4
> dd = dd %>% rowwise() %>% dplyr::mutate(time_alive_hours=get_time_alive(datetimeofdeath, injurydatetime))
There were 20 warnings (use warnings() to see them)
> dd
Source: local data frame [10 x 3]
Groups: 

   datetimeofdeath injurydatetime time_alive_hours
1                   2/10/05 17:30               NA
2                   2/13/05 19:15               NA
3                    2/15/05 1:10               NA
4    2/24/05 21:00  2/16/05 20:36               NA
5                    3/11/05 0:45               NA
6                   3/19/05 23:05               NA
7                   3/19/05 23:13               NA
8                   3/23/05 20:51               NA
9                   3/31/05 11:30               NA
10                    4/9/05 3:07               NA

正如您所看到的，第四个元素是NA，尽管当我单独将自定义函数应用到它时，我得到了192.4。为什么我的自定义函数在这里失败了？

我认为您可以大大简化代码，只需使用如下内容：

dd %>% 
  mutate_each(funs(as.POSIXct(as.character(.), format = "%m/%d/%y %H:%M"))) %>% 
  mutate(time_alive = datetimeofdeath - injurydatetime)
#      datetimeofdeath      injurydatetime    time_alive
#1                <NA> 2005-02-15 01:10:00       NA days
#2 2005-02-24 21:00:00 2005-02-16 20:36:00 8.016667 days
#3                <NA> 2005-03-11 00:45:00       NA days

旁注：

我缩短了你的输入数据，因为它不容易复制，我只取了你在我的答案中看到的那三行如果您希望以小时为单位格式化时间，只需在最后一次修改中使用mutatetime\u alive=datetimeofdeath-injurydatetime*24。如果您使用这段代码，就不需要rowwise，我想这也会使它更快