Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于R dplyr/tidyverse数据帧的多列获取最大日期_R_Date_Dplyr_Tidyverse - Fatal编程技术网

基于R dplyr/tidyverse数据帧的多列获取最大日期

基于R dplyr/tidyverse数据帧的多列获取最大日期,r,date,dplyr,tidyverse,R,Date,Dplyr,Tidyverse,来自如下所示的csv文件: 日期 时间戳 单位 名称 条件 Obj Param 属性1 Atrrib2 结果 2019-07-31 2019-08-01 01:16:09 m3 n01 a1 o1 打盹 总磷 在里面 34937 2019-07-31 2019-08-01 01:16:10 m3 n01 a2 氧气 打盹 总磷 出来 36673.09 2019-11-06 2019-11-18 20:21:06 毫克/升 n01 a3 臭氧 三号 总磷 出来 1. 2019-11-06 2019

来自如下所示的csv文件:

日期 时间戳 单位 名称 条件 Obj Param 属性1 Atrrib2 结果 2019-07-31 2019-08-01 01:16:09 m3 n01 a1 o1 打盹 总磷 在里面 34937 2019-07-31 2019-08-01 01:16:10 m3 n01 a2 氧气 打盹 总磷 出来 36673.09 2019-11-06 2019-11-18 20:21:06 毫克/升 n01 a3 臭氧 三号 总磷 出来 1. 2019-11-06 2019-11-18 20:21:06 毫克/升 n01 z5 o4 生化需氧量 木卫一 在里面 220 2019-11-06 2019-11-18 20:21:06 毫克/升 n01 z5 o4 生化需氧量 总磷 在里面 220 2019-11-06 2019-11-18 20:21:06 毫克/升 n01 z6 o1 二号 总磷 出来 0.31 2019-11-06 2019-11-18 20:21:13 毫克/升 n01 a11 o4 恩托特 木卫一 在里面 47 2019-11-06 2019-11-18 20:21:13 毫克/升 n01 a11 o4 恩托特 总磷 在里面 47 2021-01-06 2021-01-07 02:15:06 m3 n01 a1 o1 打盹 总磷 在里面 17909 2021-01-06 2021-01-07 02:15:07 m3 n01 a2 氧气 打盹 总磷 出来 19216.19
尝试使用以下方法:

library(dplyr)

read.csv("./Example.csv") %>%
#df %>%
  mutate(Date = as.Date(Date), 
        Timestamp = as.POSIXct(Timestamp, format = "%Y-%m-%d %H:%M:%S")) %>%
  distinct(Date, Condition, Result, .keep_all = TRUE) -> result

result

#        Date           Timestamp Units Name Condition Obj Param Attrib1 Atrrib2   Result
#1 2019-07-31 2019-08-01 01:16:09    m3  n01        a1  o1   Nap      TP      IN 34937.00
#2 2019-07-31 2019-08-01 01:16:10    m3  n01        a2  o2   Nap      TP     OUT 36673.09
#3 2019-11-06 2019-11-18 20:21:06  mg/l  n01        a3  o3   NO3      TP     OUT     1.00
#4 2019-11-06 2019-11-18 20:21:06  mg/l  n01        z5  o4   BOD      IO      IN   220.00
#5 2019-11-06 2019-11-18 20:21:06  mg/l  n01        z6  o1   NO2      TP     OUT     0.31
#6 2019-11-06 2019-11-18 20:21:13  mg/l  n01       a11  o4  Ntot      IO      IN    47.00
#7 2021-01-06 2021-01-07 02:15:06    m3  n01        a1  o1   Nap      TP      IN 17909.00
#8 2021-01-06 2021-01-07 02:15:07    m3  n01        a2  o2   Nap      TP     OUT 19216.19
数据

df <- structure(list(Date = c("2019-07-31", "2019-07-31", "2019-11-06", 
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", 
"2021-01-06", "2021-01-06"), Timestamp = c("2019-08-01 01:16:09", 
"2019-08-01 01:16:10", "2019-11-18 20:21:06", "2019-11-18 20:21:06", 
"2019-11-18 20:21:06", "2019-11-18 20:21:06", "2019-11-18 20:21:13", 
"2019-11-18 20:21:13", "2021-01-07 02:15:06", "2021-01-07 02:15:07"
), Units = c("m3", "m3", "mg/l", "mg/l", "mg/l", "mg/l", "mg/l", 
"mg/l", "m3", "m3"), Name = c("n01", "n01", "n01", "n01", "n01", 
"n01", "n01", "n01", "n01", "n01"), Condition = c("a1", "a2", 
"a3", "z5", "z5", "z6", "a11", "a11", "a1", "a2"), Obj = c("o1", 
"o2", "o3", "o4", "o4", "o1", "o4", "o4", "o1", "o2"), Param = c("Nap", 
"Nap", "NO3", "BOD", "BOD", "NO2", "Ntot", "Ntot", "Nap", "Nap"
), Attrib1 = c("TP", "TP", "TP", "IO", "TP", "TP", "IO", "TP", 
"TP", "TP"), Atrrib2 = c("IN", "OUT", "OUT", "IN", "IN", "OUT", 
"IN", "IN", "IN", "OUT"), Result = c(34937, 36673.09, 1, 220, 
220, 0.31, 47, 47, 17909, 19216.19)),class = "data.frame",row.names = c(NA,-10L))

df尝试使用以下方法:

library(dplyr)

read.csv("./Example.csv") %>%
#df %>%
  mutate(Date = as.Date(Date), 
        Timestamp = as.POSIXct(Timestamp, format = "%Y-%m-%d %H:%M:%S")) %>%
  distinct(Date, Condition, Result, .keep_all = TRUE) -> result

result

#        Date           Timestamp Units Name Condition Obj Param Attrib1 Atrrib2   Result
#1 2019-07-31 2019-08-01 01:16:09    m3  n01        a1  o1   Nap      TP      IN 34937.00
#2 2019-07-31 2019-08-01 01:16:10    m3  n01        a2  o2   Nap      TP     OUT 36673.09
#3 2019-11-06 2019-11-18 20:21:06  mg/l  n01        a3  o3   NO3      TP     OUT     1.00
#4 2019-11-06 2019-11-18 20:21:06  mg/l  n01        z5  o4   BOD      IO      IN   220.00
#5 2019-11-06 2019-11-18 20:21:06  mg/l  n01        z6  o1   NO2      TP     OUT     0.31
#6 2019-11-06 2019-11-18 20:21:13  mg/l  n01       a11  o4  Ntot      IO      IN    47.00
#7 2021-01-06 2021-01-07 02:15:06    m3  n01        a1  o1   Nap      TP      IN 17909.00
#8 2021-01-06 2021-01-07 02:15:07    m3  n01        a2  o2   Nap      TP     OUT 19216.19
数据

df <- structure(list(Date = c("2019-07-31", "2019-07-31", "2019-11-06", 
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", 
"2021-01-06", "2021-01-06"), Timestamp = c("2019-08-01 01:16:09", 
"2019-08-01 01:16:10", "2019-11-18 20:21:06", "2019-11-18 20:21:06", 
"2019-11-18 20:21:06", "2019-11-18 20:21:06", "2019-11-18 20:21:13", 
"2019-11-18 20:21:13", "2021-01-07 02:15:06", "2021-01-07 02:15:07"
), Units = c("m3", "m3", "mg/l", "mg/l", "mg/l", "mg/l", "mg/l", 
"mg/l", "m3", "m3"), Name = c("n01", "n01", "n01", "n01", "n01", 
"n01", "n01", "n01", "n01", "n01"), Condition = c("a1", "a2", 
"a3", "z5", "z5", "z6", "a11", "a11", "a1", "a2"), Obj = c("o1", 
"o2", "o3", "o4", "o4", "o1", "o4", "o4", "o1", "o2"), Param = c("Nap", 
"Nap", "NO3", "BOD", "BOD", "NO2", "Ntot", "Ntot", "Nap", "Nap"
), Attrib1 = c("TP", "TP", "TP", "IO", "TP", "TP", "IO", "TP", 
"TP", "TP"), Atrrib2 = c("IN", "OUT", "OUT", "IN", "IN", "OUT", 
"IN", "IN", "IN", "OUT"), Result = c(34937, 36673.09, 1, 220, 
220, 0.31, 47, 47, 17909, 19216.19)),class = "data.frame",row.names = c(NA,-10L))

df在
max(Timestamp)
处有多个值。为了解决这个问题,我建议使用
dplyr::slice\u max
并设置
,设置为=FALSE

这里有一些代码来获取你想要的

df %>% 
  mutate(Date = as.POSIXct(Date, format = "%Y-%m-%d")) %>%
  mutate(Timestamp = as.POSIXct(Timestamp, format = "%Y-%m-%d %H:%M:%S")) %>%
  group_by(Date, Condition) %>%
  slice_max(order_by = Timestamp, n = 1, with_ties = FALSE)

但是,根据您的应用程序,您可能需要明确说明如何通过
参数向
order\u提供附加变量来解决这些关系。

您在
max(Timestamp)
处有多个值。为了解决这个问题,我建议使用
dplyr::slice\u max
并设置
,设置为=FALSE

这里有一些代码来获取你想要的

df %>% 
  mutate(Date = as.POSIXct(Date, format = "%Y-%m-%d")) %>%
  mutate(Timestamp = as.POSIXct(Timestamp, format = "%Y-%m-%d %H:%M:%S")) %>%
  group_by(Date, Condition) %>%
  slice_max(order_by = Timestamp, n = 1, with_ties = FALSE)

但是,根据您的应用程序,您可能需要明确说明如何通过向
order\u by
参数提供附加变量来解决这些关系。

请使用
dput
显示示例数据而不是图像如果您不想重复行,请使用
slice\u max(时间戳,n=1)
谢谢Dan Adams和Ronak Shah,这两个答案都能解决问题,而不是
过滤器。我喜欢丹建议的方法,因为在其他需要额外条件的情况下,这种方法可能更通用。@eliasmaxil-很高兴它有帮助。请使用
dput
来显示示例数据而不是图像如果您不想重复行,请使用
slice_max(Timestamp,n=1)
而不是
过滤器
谢谢Dan Adams和Ronak Shah,这两个答案都可以。我喜欢丹建议的方法,因为在其他需要额外条件的情况下,这种方法可能更通用。@eliasmaxil-很高兴它有帮助。请放心,这是最有帮助的