如何使用Reforme2的dcast仅选择几个观察值(值)中的一个
我有以下数据集如何使用Reforme2的dcast仅选择几个观察值(值)中的一个,r,date,reshape2,dcast,R,Date,Reshape2,Dcast,我有以下数据集 > dataset2 ID ATCcode date 1 1 N06AA 2001-01-01 2 1 N06AB 2001-04-01 3 1 N06AB 2001-03-01 4 1 N06AB 2001-02-01 5 1 N06AC 2001-01-01 6 2 N06AA 2001-01-01 7 2 N06AA 2001-02-01 8 2 N06AA 2001-03-01 9
> dataset2
ID ATCcode date
1 1 N06AA 2001-01-01
2 1 N06AB 2001-04-01
3 1 N06AB 2001-03-01
4 1 N06AB 2001-02-01
5 1 N06AC 2001-01-01
6 2 N06AA 2001-01-01
7 2 N06AA 2001-02-01
8 2 N06AA 2001-03-01
9 3 N06AB 2001-01-01
10 4 N06AA 2001-02-01
11 4 N06AB 2001-03-01
它是长格式的,我希望它是宽格式的。然而,我只想要每个ATCcode的最早日期,而不是任何较晚的日期。因此,我想在这里结束:
> datasetLong
ID N06AA N06AB N06AC
1 1 2001-01-01 2001-02-01 2001-01-01
2 2 2001-01-01 <NA> <NA>
3 3 <NA> 2001-01-01 <NA>
4 4 2001-02-01 2001-03-01 <NA>
而不是长度,我只想要一个值,这个值应该是最小的值,或者是最早的日期
我发现有人问过一个类似的问题,但我无法以任何方式使用它而不会出现各种错误。我在上述尝试中没有使用melt,这可能是必要的吗?
感谢您的帮助。此答案使用tidyverse方法
一种方法是从每个ID和ATCcode中选择最短日期,并将数据转换为宽格式
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
group_by(ID, ATCcode) %>%
slice(which.min(date)) %>%
tidyr::pivot_wider(names_from = ATCcode, values_from = date)
# ID N06AA N06AB N06AC
# <int> <date> <date> <date>
#1 1 2001-01-01 2001-02-01 2001-01-01
#2 2 2001-01-01 NA NA
#3 3 NA 2001-01-01 NA
#4 4 2001-02-01 2001-03-01 NA
资料
你是在寻找答案吗?是的,作者建议使用tidyr。非常感谢,这正是我想要做的。
> dataset3
ID N06AA N06AB N06AC
1 1 1 3 1
2 2 3 0 0
3 3 0 1 0
4 4 1 1 0
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
group_by(ID, ATCcode) %>%
slice(which.min(date)) %>%
tidyr::pivot_wider(names_from = ATCcode, values_from = date)
# ID N06AA N06AB N06AC
# <int> <date> <date> <date>
#1 1 2001-01-01 2001-02-01 2001-01-01
#2 2 2001-01-01 NA NA
#3 3 NA 2001-01-01 NA
#4 4 2001-02-01 2001-03-01 NA
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L,
4L), ATCcode = structure(c(1L, 2L, 2L, 2L, 3L, 1L, 1L, 1L, 2L,
1L, 2L), .Label = c("N06AA", "N06AB", "N06AC"), class = "factor"),
date = structure(c(1L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 1L, 2L,
3L), .Label = c("2001-01-01", "2001-02-01", "2001-03-01",
"2001-04-01"), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))