R 基于条件的列求和

R 基于条件的列求和,r,tidyverse,R,Tidyverse,我有一个由三列组成的数据框:x、ID和date\u time。 “x”列是一个变量x的记录,ID表示记录的内容,而date\u time表示记录的时间。请参见下面的一段数据帧 从这个数据框中,我想计算一个新的数据框,它有七列:“测量”、“ID”和“日期”、“x_4_10_day”、“day_total”、“x_4_10_night”、“night_total” “测量”。此列应说明给定ID的测量值。测量值从23:00:00开始,然后运行到第二天22:59:59。然而,测量在随机时间开始,因此第一

我有一个由三列组成的数据框:x、ID和date\u time。 “x”列是一个变量x的记录,ID表示记录的内容,而date\u time表示记录的时间。请参见下面的一段数据帧

从这个数据框中,我想计算一个新的数据框,它有七列:“测量”、“ID”和“日期”、“x_4_10_day”、“day_total”、“x_4_10_night”、“night_total”

  • “测量”。此列应说明给定ID的测量值。测量值从23:00:00开始,然后运行到第二天22:59:59。然而,测量在随机时间开始,因此第一次测量的持续时间不是24小时。也不是24小时内的最后一次测量
  • “身份证”。指示给定测量的ID
  • “日期”。此列应以以下格式显示给定测量中最后一次记录的日期:yyyy.mm.dd
  • “x_4_10_日”:测量分为白天(7:00:00-22:59:59)和夜晚(23:00:00-6:59:59)。此列应显示给定测量中每天总时间(以分钟为单位)x在4-10之间(均包括在内)。在4-10之间记录x可以被视为x在4-10之间持续5分钟,因为每次记录之间有5分钟
  • “Day_total”:此列应显示一天中测量的总时间(以分钟为单位)x。x中缺少应减去的值。缺少的x值保留为空。对于每次缺失的测量,应从总时间中减去5分钟。此外,一些测量开始时间晚于7:00
  • “x_4_10_night”:此列应显示给定测量中每晚x在4-10(均包括在内)之间的总时间(分钟)
  • “Night_total”:此列应表明在一个晚上测量的总时间(以分钟为单位)x。x中缺少应减去的值。缺少的x值保留为空。对于每次缺失的测量,应从总时间中减去5分钟
  • 每个独特的测量值都应该有一行。到目前为止,我有一个代码可以正确返回列:“Measurement”、“ID”和“Date”:

    df1$mydate = as.Date(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
    df1$tm <- as.numeric(df1$date_time)
    df1$dts <- 86400*as.numeric(df1$mydate)
    df2 <- df1 %>% 
    group_by(ID,mydate) %>% 
    transform(date = case_when(((dts-3600)<tm & tm<(dts+82800)) ~paste0(mydate), ((dts+82800)<=tm) ~paste0(mydate+1) )) %>% 
    select(ID,date) %>%   
    unique() %>% 
    group_by(ID) %>% 
    mutate(measurement = row_number())
    

    我已将id=14的
    id
    添加到您的数据帧中,其中仅包含夜间值。这可能是您正在寻找的。请注意,您的预期值并不完全符合您的要求

    df11 <- structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
                                 "2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
                                 "2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
                                 "2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
                                 "2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
                                 "2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
                                 "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                                 "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
                                 "2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
                                 "2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                                 "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17" 
                                 ), 
                          x = c("7.55", "4.55", "4.55", "12", 
                                "12", "10", "10", "4.3", "", "", "4.3", "4.3", "4.3", "", "4.3", 
                                "12", "12", "12", "2", "12", "12", "", "8", "3", "3", "2", "2", 
                                "", "12", "10", "10", "4.3", "4.3", "4.3", "4.3", "4.3", "4.3", 
                                "4.3", "4.3", "12", "12", "12", "12", "12", "12", "12",
                                "12", "10", "10", "4.3", "4.3", "4.3"),
                   id = c(12L, 12L, 
                          12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                          12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                          13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
                          13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L)), 
                   row.names = c(NA, 52L), class = "data.frame")
    
    df11$xn <- as.numeric(df11$x)
    df1 <- df11 %>% transform(xmin = ifelse((xn<4 | xn>10 | is.na(xn)),0,5 ),
                              xmint = ifelse(is.na(xn),-5,5 ))
    df1$dateTime = as_datetime(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
    df1$mydate = as.Date(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
    
    df1$tm <- as.numeric(df1$dateTime)
    df1$dts <- 86400*as.numeric(df1$mydate)
    
    df2 <- df1 %>% group_by(id,mydate) %>% 
             transform(date = case_when(((dts-3600)<tm & tm<(dts+82800) )~paste0(mydate),((dts+82800)<=tm)~paste0(mydate+1) )) %>%
             transform(dayrnight = ifelse((tm>=(dts+25200) & tm<(dts+82800) ),'day','night' ) ) %>% 
             group_by(id,date,dayrnight) %>% 
             dplyr::summarise(x_4_10 = sum(xmin), total = sum(xmint)) %>% 
             pivot_wider(id_cols = c(id,date), names_from = dayrnight, values_from = c("x_4_10", "total")) %>% 
             mutate_if(is.numeric , replace_na, replace = 0) %>% 
             group_by(id) %>% mutate(measurement = row_number()) %>% 
             select(id,date,measurement,x_4_10_day,total_day,x_4_10_night,total_night)
    
    > df2
    # A tibble: 4 x 7
    # Groups:   id [3]
         id date       measurement x_4_10_day total_day x_4_10_night total_night
      <int> <chr>            <int>      <dbl>     <dbl>        <dbl>       <dbl>
    1    12 2020-03-02           1         30        40            0           0
    2    12 2020-03-03           2          0         0           25          50
    3    13 2020-05-09           1         50        90            0           0
    4    14 2020-03-03           1          0         0           25          30
    
    df11%
    分组依据(id)%>%mutate(measurement=行编号())%>%
    选择(id、日期、测量、x_4_10_日、总日、x_4_10_夜、总夜)
    >df2
    #一个tibble:4x7
    #组别:id[3]
    id日期测量x_4_10_天总计x_4_10_夜总计
    1    12 2020-03-02           1         30        40            0           0
    2    12 2020-03-03           2          0         0           25          50
    3    13 2020-05-09           1         50        90            0           0
    4    14 2020-03-03           1          0         0           25          30
    
    我已在数据框中添加了仅包含夜间值的
    id=14
    。这可能是您正在寻找的。请注意,您的预期值并不完全符合您的要求

    df11 <- structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
                                 "2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
                                 "2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
                                 "2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
                                 "2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
                                 "2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
                                 "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                                 "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
                                 "2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
                                 "2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                                 "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                                 "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17" 
                                 ), 
                          x = c("7.55", "4.55", "4.55", "12", 
                                "12", "10", "10", "4.3", "", "", "4.3", "4.3", "4.3", "", "4.3", 
                                "12", "12", "12", "2", "12", "12", "", "8", "3", "3", "2", "2", 
                                "", "12", "10", "10", "4.3", "4.3", "4.3", "4.3", "4.3", "4.3", 
                                "4.3", "4.3", "12", "12", "12", "12", "12", "12", "12",
                                "12", "10", "10", "4.3", "4.3", "4.3"),
                   id = c(12L, 12L, 
                          12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                          12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                          13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
                          13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L)), 
                   row.names = c(NA, 52L), class = "data.frame")
    
    df11$xn <- as.numeric(df11$x)
    df1 <- df11 %>% transform(xmin = ifelse((xn<4 | xn>10 | is.na(xn)),0,5 ),
                              xmint = ifelse(is.na(xn),-5,5 ))
    df1$dateTime = as_datetime(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
    df1$mydate = as.Date(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
    
    df1$tm <- as.numeric(df1$dateTime)
    df1$dts <- 86400*as.numeric(df1$mydate)
    
    df2 <- df1 %>% group_by(id,mydate) %>% 
             transform(date = case_when(((dts-3600)<tm & tm<(dts+82800) )~paste0(mydate),((dts+82800)<=tm)~paste0(mydate+1) )) %>%
             transform(dayrnight = ifelse((tm>=(dts+25200) & tm<(dts+82800) ),'day','night' ) ) %>% 
             group_by(id,date,dayrnight) %>% 
             dplyr::summarise(x_4_10 = sum(xmin), total = sum(xmint)) %>% 
             pivot_wider(id_cols = c(id,date), names_from = dayrnight, values_from = c("x_4_10", "total")) %>% 
             mutate_if(is.numeric , replace_na, replace = 0) %>% 
             group_by(id) %>% mutate(measurement = row_number()) %>% 
             select(id,date,measurement,x_4_10_day,total_day,x_4_10_night,total_night)
    
    > df2
    # A tibble: 4 x 7
    # Groups:   id [3]
         id date       measurement x_4_10_day total_day x_4_10_night total_night
      <int> <chr>            <int>      <dbl>     <dbl>        <dbl>       <dbl>
    1    12 2020-03-02           1         30        40            0           0
    2    12 2020-03-03           2          0         0           25          50
    3    13 2020-05-09           1         50        90            0           0
    4    14 2020-03-03           1          0         0           25          30
    
    df11%
    分组依据(id)%>%mutate(measurement=行编号())%>%
    选择(id、日期、测量、x_4_10_日、总日、x_4_10_夜、总夜)
    >df2
    #一个tibble:4x7
    #组别:id[3]
    id日期测量x_4_10_天总计x_4_10_夜总计
    1    12 2020-03-02           1         30        40            0           0
    2    12 2020-03-03           2          0         0           25          50
    3    13 2020-05-09           1         50        90            0           0
    4    14 2020-03-03           1          0         0           25          30
    
    我花了一些时间,但你可能想要这个

    样本数据(在
    13
    中的日期/时间有点变化)都是相同的

    df <- structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
                                 "2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
                                 "2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
                                 "2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
                                 "2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
                                 "2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
                                 "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                                 "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
                                 "2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
                                 "2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:44:32", "2020.05.09 08:49:32", "2020.05.09 08:54:32", 
                                 "2020.05.09 08:59:32", "2020.05.09 09:39:32", "2020.05.09 09:44:32", 
                                 "2020.05.09 09:49:32", "2020.05.09 09:59:32", "2020.05.09 10:39:32", 
                                 "2020.05.09 11:39:32", "2020.05.09 12:39:32", "2020.05.09 13:39:32", 
                                 "2020.05.09 14:39:32", "2020.05.09 15:39:32", "2020.05.09 16:39:32", 
                                 "2020.05.09 17:39:32", "2020.05.09 18:39:32"), id = c(12L, 12L, 
                                                                                       12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                                                                                       12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                                                                                       13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
                                                                                       13L, 13L, 13L, 13L, 13L), x = c("7.55", "4.55", "4.55", "12", 
                                                                                                                       "12", "10", "10", "4.3", "", "", "4.3", "4.3", "4.3", "", "4.3", 
                                                                                                                       "12", "12", "12", "2", "12", "12", "", "8", "3", "3", "2", "2", 
                                                                                                                       "", "12", "10", "10", "4.3", "4.3", "4.3", "4.3", "4.3", "4.3", 
                                                                                                                       "4.3", "4.3", "12", "12", "12", "12", "12", "12", "12")), row.names = c(NA, 
                                                                                                                                                                                               46L), class = "data.frame")
    
    df%as_-tible()%>%
    变换(x=作为数值(x),
    日期时间=作为日期时间(日期时间),
    id=as.character(id))%>%
    变异(d_n=ifelse(小时(日期时间)>=7和小时(日期时间)=4和x%
    mutate(valid_m=ifelse(is.na(valid_m),0,valid_m))%>%#有效度量
    安排(id、日期和时间)%>%
    分组依据(id)%>%
    变异(有效期=数值(提前期(日期时间)-日期时间))%>%
    过滤器(!is.na(validm_d))%>%
    分组人(id、日期、数字、有效数字)%>%
    总结(x_tm=总和(有效值))%>%
    解组()%>%
    轴宽(名称从=d\u n,值从=x\u tm,值从=0)%>%
    分组人(id,日期)%>%
    突变(日=sum(日),夜=sum(夜))%>%
    筛选器(有效的\u m!=0)%>%
    分组依据(id)%>%
    变异(测量=行数())%>%
    选择(id,测量,日期,x_4_10_day=天,x_4_10_total=天,
    x_4_10_night=夜间,x_4_10_Total n=夜间)
    
    期望的结果

    id    measurement Date       x_4_10_day x_4_10_total x_4_10_night x_4_10_totaln
      <chr>       <int> <date>          <dbl>        <dbl>        <dbl>         <dbl>
    1 12              1 2020-03-02         50           60           20            60
    2 12              2 2020-03-03          0            0            5            85
    3 13              1 2020-05-09        235          600            0             0
    
    id测量日期x_4_10_日x_4_10_总计x_4_10_夜间x_4_10_总计
    1 12              1 2020-03-02         50           60           20            60
    2 12              2 2020-03-03          0            0            5            85
    3 13              1 2020-05-09        235          600            0             0
    

    在这个解决方案中,我删除了每个测量的最后一个值,因为我不确定该测量要进行多长时间。您可以适当地更改代码。“day”的最后一次测量基本上结束于2300小时,因此第一行的结果应该比显示的结果少17秒。

    我花了一些时间,但可能您想要这个

    样本数据(在
    13
    中的日期/时间有点变化)都是相同的

    df <- structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
                                 "2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
                                 "2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
                                 "2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
                                 "2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
                                 "2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
                                 "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                                 "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
                                 "2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
                                 "2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
                                 "2020.05.09 08:44:32", "2020.05.09 08:49:32", "2020.05.09 08:54:32", 
                                 "2020.05.09 08:59:32", "2020.05.09 09:39:32", "2020.05.09 09:44:32", 
                                 "2020.05.09 09:49:32", "2020.05.09 09:59:32", "2020.05.09 10:39:32", 
                                 "2020.05.09 11:39:32", "2020.05.09 12:39:32", "2020.05.09 13:39:32", 
                                 "2020.05.09 14:39:32", "2020.05.09 15:39:32", "2020.05.09 16:39:32", 
                                 "2020.05.09 17:39:32", "2020.05.09 18:39:32"), id = c(12L, 12L, 
                                                                                       12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                                                                                       12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                                                                                       13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
                                                                                       13L, 13L, 13L, 13L, 13L), x = c("7.55", "4.55", "4.55", "12", 
                                                                                                                       "12", "10", "10", "4.3", "", "", "4.3", "4.3", "4.3", "", "4.3", 
                                                                                                                       "12", "12", "12", "2", "12", "12", "", "8", "3", "3", "2", "2", 
                                                                                                                       "", "12", "10", "10", "4.3", "4.3", "4.3", "4.3", "4.3", "4.3", 
                                                                                                                       "4.3", "4.3", "12", "12", "12", "12", "12", "12", "12")), row.names = c(NA, 
                                                                                                                                                                                               46L), class = "data.frame")
    
    df%as_-tible()%>%
    变换(x=作为数值(x),
    日期时间=作为日期时间(日期)
    
    id    measurement Date       x_4_10_day x_4_10_total x_4_10_night x_4_10_totaln
      <chr>       <int> <date>          <dbl>        <dbl>        <dbl>         <dbl>
    1 12              1 2020-03-02         50           60           20            60
    2 12              2 2020-03-03          0            0            5            85
    3 13              1 2020-05-09        235          600            0             0