R 简化来自冲突源的数据
我有一些可靠性有问题的数据来源的数据:R 简化来自冲突源的数据,r,R,我有一些可靠性有问题的数据来源的数据: date | value | source =================================== 2011-09-30 | 10.9910 | best 2011-12-31 | 11.5000 | ok 2011-12-31 | 11.5290 | best 2012-03-31 | 12.8477 | ok 2012-03-31 | 12.4677
date | value | source
===================================
2011-09-30 | 10.9910 | best
2011-12-31 | 11.5000 | ok
2011-12-31 | 11.5290 | best
2012-03-31 | 12.8477 | ok
2012-03-31 | 12.4677 | worst
2012-06-30 | -1.5 | unacceptable
我想把它整理成一个简单的时间序列,根据数据源的偏好顺序:“最好的”比“好的”比“最差的”,然后扔掉“不可接受的”。在我的例子中:
date | value
========================
2011-09-30 | 10.9910
2011-12-31 | 11.5290
2012-03-31 | 12.8477
2012-06-30 | NA # or just skip this line
有什么好办法吗?我的样本数据的dput
是:
df = structure(list(date = structure(c(15247, 15339, 15339, 15430, 15430, 15491, 15613, 15613, 15705, 15795, 15795, 15886, 15978, 15978, 15978, 16070, 16070, 16070, 16160, 16160), class = "Date"),
value = c(10.991, 11.500, 11.529, 12.8477, 12.4677, 11.542, 12.1203, 12.1146, 12.5053, 13.3556, 13.3628, 13.3372, 13.844, 13.844, 13.8419, 15.3403, 15.3403, 15.3306, 15.202, 15.202 ),
source = c("best", "ok", "best", "ok", "worst", "ok", "ok", "worst", "ok", "ok", "worst", "unacceptable", "ok", "best", "worst", "ok", "best", "worst", "ok", "best")),
row.names = c(NA, 20L),
.Names = c("date", "value", "source"),
class = "data.frame")
您可以将源代码转换为因子并进行比较
library(dplyr)
df %>%
mutate(source=factor(source, c("best", "ok", "worst"))) %>%
group_by(date) %>%
top_n(-1, source) %>%
ungroup()
# A tibble: 10 x 3
date value source
<date> <dbl> <fctr>
1 2011-09-30 10.9910 best
2 2011-12-31 11.5290 best
3 2012-03-31 12.8477 ok
4 2012-05-31 11.5420 ok
5 2012-09-30 12.1203 ok
6 2012-12-31 12.5053 ok
7 2013-03-31 13.3556 ok
8 2013-09-30 13.8440 best
9 2013-12-31 15.3403 best
10 2014-03-31 15.2020 best
库(dplyr)
df%>%
突变(源=因子(源,c(“最佳”、“正常”、“最差”))%>%
分组单位(日期)%>%
顶部n(-1,源)%>%
解组()
#一个tibble:10x3
日期值源
1 2011-09-30 10.9910最佳
2011年12月31日11.5290最佳
3 2012-03-31 12.8477正常
4 2012-05-31 11.5420正常
5 2012-09-30 12.1203 ok
6 2012-12-31 12.5053正常
7 2013-03-31 13.3556正常
8 2013-09-30 13.8440最佳
9 2013-12-31 15.3403最佳
10 2014-03-31 15.2020最佳