Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何筛选R中关于一列的列?_R_Filter - Fatal编程技术网

如何筛选R中关于一列的列?

如何筛选R中关于一列的列?,r,filter,R,Filter,我在R中有这个数据集: 我想每次都过滤“weather_description”中的重复值。但是,如果它再次出现在数据集中,则不应删除它,我只希望每次此列在此变量中有重复值时都删除它。输出应如下所示: 2015-01-0101:00:00 sky is clear 1420070400 2015-01-0102:00:00 scattered clouds 1420074000 2015-01-0104:00:00 sky is clear 1420081200 在R中有什么简单的方法可以做

我在R中有这个数据集:

我想每次都过滤“weather_description”中的重复值。但是,如果它再次出现在数据集中,则不应删除它,我只希望每次此列在此变量中有重复值时都删除它。输出应如下所示:

2015-01-0101:00:00 sky is clear 1420070400
2015-01-0102:00:00 scattered clouds 1420074000
2015-01-0104:00:00 sky is clear 1420081200

在R中有什么简单的方法可以做到这一点吗?

这是一种带有聚合的基本解决方案:

aggregate(Time ~ .,df,head,1)

每@CodeMonkey使用
dplyr

df %>%
mutate(grouper = cumsum(weather_description == lag(weather_description, default = first(weather_description)))) %>%
group_by(grouper) %>%
summarise(Time = first(time),
          weather_description = first(weather_description),
          timestamps = first(timestamps))

请告知base r中的此解决方案是否适用于您:

数据

df <- data.frame(Time = c(as.Date(16436),as.Date(16437),as.Date(16437),as.Date(16437),
                          as.Date(16437),as.Date(16438),as.Date(16438),as.Date(16438),
                          as.Date(16438),as.Date(16439),as.Date(16439),as.Date(16439)), 
                 weather_description = c("sky is clear",
                                         "scattered clouds")[c(1,2,2,2,2,2,2,2,2,1,1,1)])
df
#         Time weather_description
#1  2015-01-01        sky is clear
#2  2015-01-02    scattered clouds
#3  2015-01-02    scattered clouds
#4  2015-01-02    scattered clouds
#5  2015-01-02    scattered clouds
#6  2015-01-03    scattered clouds
#7  2015-01-03    scattered clouds
#8  2015-01-03    scattered clouds
#9  2015-01-03    scattered clouds
#10 2015-01-04        sky is clear
#11 2015-01-04        sky is clear
#12 2015-01-04        sky is clear

df您不需要简单的分组方式,也不需要在3列之间进行区分。谷歌dplyr。
weather_changes <- function(dat){
  # split by weather description
  splitted <- split(dat, dat[,2])
  # for each, return only the first dates of a sequence
  byweather <- lapply(splitted, function(x) x[-which(c(0,ifelse(diff(x[,1])<2,1,0))==1),])
  # combine to a single data.frame
  newdf <- do.call(rbind, byweather)
  # order by date
  newdf <- newdf[order(newdf[,1]),]
  # remove the messy row names
  rownames(newdf) <- NULL
  newdf
}
weather_changes(df)
#        Time weather_description
#1 2015-01-01        sky is clear
#2 2015-01-02    scattered clouds
#3 2015-01-04        sky is clear