R 简化来自冲突源的数据

R 简化来自冲突源的数据,r,R,我有一些可靠性有问题的数据来源的数据: date | value | source =================================== 2011-09-30 | 10.9910 | best 2011-12-31 | 11.5000 | ok 2011-12-31 | 11.5290 | best 2012-03-31 | 12.8477 | ok 2012-03-31 | 12.4677

我有一些可靠性有问题的数据来源的数据:

 date        | value      |  source
 ===================================
 2011-09-30  |  10.9910   |  best
 2011-12-31  |  11.5000   |  ok
 2011-12-31  |  11.5290   |  best
 2012-03-31  |  12.8477   |  ok
 2012-03-31  |  12.4677   |  worst
 2012-06-30  |  -1.5      |  unacceptable
我想把它整理成一个简单的时间序列,根据数据源的偏好顺序:“最好的”比“好的”比“最差的”,然后扔掉“不可接受的”。在我的例子中:

 date        | value 
 ========================
 2011-09-30  | 10.9910
 2011-12-31  | 11.5290
 2012-03-31  | 12.8477
 2012-06-30  | NA           # or just skip this line
有什么好办法吗?我的样本数据的
dput
是:

df = structure(list(date = structure(c(15247, 15339, 15339, 15430, 15430, 15491, 15613, 15613, 15705, 15795, 15795, 15886, 15978, 15978, 15978, 16070, 16070, 16070, 16160, 16160), class = "Date"),     
    value = c(10.991, 11.500, 11.529, 12.8477, 12.4677, 11.542, 12.1203, 12.1146, 12.5053, 13.3556, 13.3628, 13.3372, 13.844, 13.844, 13.8419, 15.3403, 15.3403, 15.3306, 15.202, 15.202    ), 
    source = c("best", "ok", "best", "ok", "worst", "ok", "ok", "worst", "ok", "ok", "worst", "unacceptable", "ok", "best", "worst", "ok", "best", "worst", "ok", "best")), 
    row.names = c(NA, 20L), 
    .Names = c("date", "value", "source"), 
    class = "data.frame")

您可以将源代码转换为因子并进行比较

library(dplyr)
df %>%
  mutate(source=factor(source, c("best", "ok", "worst"))) %>%
  group_by(date) %>%
  top_n(-1, source) %>%
  ungroup()

# A tibble: 10 x 3
         date   value source
       <date>   <dbl> <fctr>
 1 2011-09-30 10.9910   best
 2 2011-12-31 11.5290   best
 3 2012-03-31 12.8477     ok
 4 2012-05-31 11.5420     ok
 5 2012-09-30 12.1203     ok
 6 2012-12-31 12.5053     ok
 7 2013-03-31 13.3556     ok
 8 2013-09-30 13.8440   best
 9 2013-12-31 15.3403   best
10 2014-03-31 15.2020   best
库(dplyr)
df%>%
突变(源=因子(源,c(“最佳”、“正常”、“最差”))%>%
分组单位(日期)%>%
顶部n(-1,源)%>%
解组()
#一个tibble:10x3
日期值源
1 2011-09-30 10.9910最佳
2011年12月31日11.5290最佳
3 2012-03-31 12.8477正常
4 2012-05-31 11.5420正常
5 2012-09-30 12.1203 ok
6 2012-12-31 12.5053正常
7 2013-03-31 13.3556正常
8 2013-09-30 13.8440最佳
9 2013-12-31 15.3403最佳
10 2014-03-31 15.2020最佳