如何筛选R中关于一列的列?
我在R中有这个数据集: 我想每次都过滤“weather_description”中的重复值。但是,如果它再次出现在数据集中,则不应删除它,我只希望每次此列在此变量中有重复值时都删除它。输出应如下所示:如何筛选R中关于一列的列?,r,filter,R,Filter,我在R中有这个数据集: 我想每次都过滤“weather_description”中的重复值。但是,如果它再次出现在数据集中,则不应删除它,我只希望每次此列在此变量中有重复值时都删除它。输出应如下所示: 2015-01-0101:00:00 sky is clear 1420070400 2015-01-0102:00:00 scattered clouds 1420074000 2015-01-0104:00:00 sky is clear 1420081200 在R中有什么简单的方法可以做
2015-01-0101:00:00 sky is clear 1420070400
2015-01-0102:00:00 scattered clouds 1420074000
2015-01-0104:00:00 sky is clear 1420081200
在R中有什么简单的方法可以做到这一点吗?这是一种带有聚合的基本解决方案:
aggregate(Time ~ .,df,head,1)
每@CodeMonkey使用
dplyr
df %>%
mutate(grouper = cumsum(weather_description == lag(weather_description, default = first(weather_description)))) %>%
group_by(grouper) %>%
summarise(Time = first(time),
weather_description = first(weather_description),
timestamps = first(timestamps))
请告知base r中的此解决方案是否适用于您: 数据
df <- data.frame(Time = c(as.Date(16436),as.Date(16437),as.Date(16437),as.Date(16437),
as.Date(16437),as.Date(16438),as.Date(16438),as.Date(16438),
as.Date(16438),as.Date(16439),as.Date(16439),as.Date(16439)),
weather_description = c("sky is clear",
"scattered clouds")[c(1,2,2,2,2,2,2,2,2,1,1,1)])
df
# Time weather_description
#1 2015-01-01 sky is clear
#2 2015-01-02 scattered clouds
#3 2015-01-02 scattered clouds
#4 2015-01-02 scattered clouds
#5 2015-01-02 scattered clouds
#6 2015-01-03 scattered clouds
#7 2015-01-03 scattered clouds
#8 2015-01-03 scattered clouds
#9 2015-01-03 scattered clouds
#10 2015-01-04 sky is clear
#11 2015-01-04 sky is clear
#12 2015-01-04 sky is clear
df您不需要简单的分组方式,也不需要在3列之间进行区分。谷歌dplyr。
weather_changes <- function(dat){
# split by weather description
splitted <- split(dat, dat[,2])
# for each, return only the first dates of a sequence
byweather <- lapply(splitted, function(x) x[-which(c(0,ifelse(diff(x[,1])<2,1,0))==1),])
# combine to a single data.frame
newdf <- do.call(rbind, byweather)
# order by date
newdf <- newdf[order(newdf[,1]),]
# remove the messy row names
rownames(newdf) <- NULL
newdf
}
weather_changes(df)
# Time weather_description
#1 2015-01-01 sky is clear
#2 2015-01-02 scattered clouds
#3 2015-01-04 sky is clear