R 如何从数据帧中获取最低的行(或最近的日期)?
我想为每个R 如何从数据帧中获取最低的行(或最近的日期)?,r,dataframe,R,Dataframe,我想为每个县(也称为最新数据,因为数据是按日期组织的)取最低的一行。我想删除所有旧数据。一些县的最新数据是在1月份,一些是在3月份(未显示)。我的df1大约有15000行长。如何做到这一点?输出应为: date county state cases deaths FIPS 1: 2020-01-21 Snohomish Washington 1 0 53061 2: 2020-01-22 Snohomish Washingt
县
(也称为最新数据,因为数据是按日期组织的)取最低的一行。我想删除所有旧数据。一些县的最新数据是在1月份,一些是在3月份(未显示)。我的df1大约有15000行长。如何做到这一点?输出应为:
date county state cases deaths FIPS
1: 2020-01-21 Snohomish Washington 1 0 53061
2: 2020-01-22 Snohomish Washington 3 0 53061
3: 2020-01-23 Snohomish Washington 5 1 53061
4: 2020-01-24 Cook Illinois 1 0 17031
5: 2020-01-24 Snohomish Washington 5 1 53061
6: 2020-01-25 Orange California 1 0 6059
7: 2020-01-25 Los Angeles California 2 0 6037
8: 2020-01-25 Snohomish Washington 5 2 53061
9: 2020-01-26 Maricopa Arizona 1 0 4013
10: 2020-01-26 Los Angeles California 17 0 6037
11: 2020-01-27 Maricopa Arizona 3 1 4013
12: 2020-01-28 Cook Illinois 2 2 17031
这里有一个带有
数据表的选项。按“县”、“州”分组,获取max“date”的索引以子集data.table
date county state cases deaths FIPS
6: 2020-01-25 Orange California 1 0 6059
8: 2020-01-25 Snohomish Washington 5 2 53061
10: 2020-01-26 Los Angeles California 17 0 6037
11: 2020-01-27 Maricopa Arizona 3 1 4013
12: 2020-01-28 Cook Illinois 2 2 17031
或者使用.I
library(data.table)
setDT(df1)[, .SD[which.max(date)], .(county, state)]
# county state date cases deaths FIPS
#1: Snohomish Washington 2020-01-25 5 2 53061
#2: Cook Illinois 2020-01-28 2 2 17031
#3: Orange California 2020-01-25 1 0 6059
#4: Los Angeles California 2020-01-26 17 0 6037
#5: Maricopa Arizona 2020-01-27 3 1 4013
或使用切片
setDT(df1)[df1[, .I[which.max(date)], .(county, state)]$V1]
或者,按“县”、“州”和“日期”排列
,然后获得不同的
行
library(dplyr)
df1 %>%
group_by(county, state) %>%
slice(which.max(date))
或使用
base R
df1 %>%
arrange(county, state, desc(date)) %>%
distinct(state, county, .keep_all = TRUE)
或者另一种选择是订购
,然后使用复制
subset(df1, ave(date, county, state, FUN = max) == date)
df2
df2 <- df1[with(df1, order(county, state, -date)),]
df2[!duplicated(df2[c('county', 'state)]),]
df1 <- structure(list(date = structure(c(18282, 18283, 18284, 18285,
18285, 18286, 18286, 18286, 18287, 18287, 18288, 18289), class = "Date"),
county = c("Snohomish", "Snohomish", "Snohomish", "Cook",
"Snohomish", "Orange", "Los Angeles", "Snohomish", "Maricopa",
"Los Angeles", "Maricopa", "Cook"), state = c("Washington",
"Washington", "Washington", "Illinois", "Washington", "California",
"California", "Washington", "Arizona", "California", "Arizona",
"Illinois"), cases = c(1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 1L,
17L, 3L, 2L), deaths = c(0L, 0L, 1L, 0L, 1L, 0L, 0L, 2L,
0L, 0L, 1L, 2L), FIPS = c(53061L, 53061L, 53061L, 17031L,
53061L, 6059L, 6037L, 53061L, 4013L, 6037L, 4013L, 17031L
)), row.names = c("1:", "2:", "3:", "4:", "5:", "6:", "7:",
"8:", "9:", "10:", "11:", "12:"), class = "data.frame")