如何在R中分组和合并列
我有这个数据框: d 我需要能够把日期放在一个由逗号分隔的单元格中,按产品、日期和月份分组。比如说, Server1、Serve2、Server4出现在2015-01-062015-01-1412015-01-15105-01-20的1月份 我的新df需要如下所示:如何在R中分组和合并列,r,data.table,R,Data.table,我有这个数据框: d 我需要能够把日期放在一个由逗号分隔的单元格中,按产品、日期和月份分组。比如说, Server1、Serve2、Server4出现在2015-01-062015-01-1412015-01-15105-01-20的1月份 我的新df需要如下所示: Product Day Date Month Day_list Server1,Serve2,Server4 Tues 2015-01-06 Jan 2015-01-06,2015-01-13
Product Day Date Month Day_list
Server1,Serve2,Server4 Tues 2015-01-06 Jan 2015-01-06,2015-01-13,2015-01-20
有没有软件包可以帮助我在R中完成这项工作
我尝试使用data.table包:
d[,d:=paste(Date,Date), c("Product","Day","Month")]
不工作这里有一个使用dplyr的解决方案:
d %>% mutate(
Product = gsub("[ ]", "", Product),
Day = gsub("[ ] ", "", Day )
) %>%
group_by(Product, Month) %>%
mutate(
Day_list = paste(Date, collapse = "")
)
Product Day Date Month Day_list
1 Server1,Serve2,Server4 Tue 2015-01-06 Jan 2015-01-06 2015-01-14 2015-01-15
2 App_Servers Wed 2015-01-07 Jan 2015-01-07
3 Db_servers,application Tue 2015-01-13 Jan 2015-01-13
4 Server1,Serve2,Server4 Wed 2015-01-14 Jan 2015-01-06 2015-01-14 2015-01-15
5 Server1,Serve2,Server4 Thu 2015-01-15 Jan 2015-01-06 2015-01-14 2015-01-15
6 Server1,Serve2,Sever4 Tue 2015-01-20 Jan 2015-01-20
7 Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16 2015-02-16
8 Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16 2015-02-16
这里有几件事 首先,列中有额外的空格。您必须删除,以便将它们组合在一起
require(data.table)
setDT(d)[, `:=`(Product = gsub("[ ]", "", Product),
Date = gsub("[ ]", "", Date))]
其次,您错误地使用了paste()
和:=
d[, Date_list := paste(Date, collapse=","), by=c("Product", "Month")]
d
# Product Day Date Month Date_list
# 1: Server1,Serve2,Server4 Tue 2015-01-06 Jan 2015-01-06,2015-01-14,2015-01-15
# 2: App_Servers Wed 2015-01-07 Jan 2015-01-07
# 3: Db_servers,application Tue 2015-01-13 Jan 2015-01-13
# 4: Server1,Serve2,Server4 Wed 2015-01-14 Jan 2015-01-06,2015-01-14,2015-01-15
# 5: Server1,Serve2,Server4 Thu 2015-01-15 Jan 2015-01-06,2015-01-14,2015-01-15
# 6: Server1,Serve2,Sever4 Tue 2015-01-20 Jan 2015-01-20
# 7: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16
# 8: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16
看一看这本书的内容和小插曲
编辑:我刚刚意识到第6行有一个
产品
的打字错误。如果新的df Day_列表列不是2015-01-062015-01-1412015-01-1512015-01-20
(基于上述数据),则它是Sever4
,而不是Server1、Serve2、Sever4
(不,因为1月份的最后一个条目是Server1、Serve2、Server4
,而不是Server1、Serve2、Server4
。但仅仅看一下代码就很难发现:-)。
d[, Date_list := paste(Date, collapse=","), by=c("Product", "Month")]
d
# Product Day Date Month Date_list
# 1: Server1,Serve2,Server4 Tue 2015-01-06 Jan 2015-01-06,2015-01-14,2015-01-15
# 2: App_Servers Wed 2015-01-07 Jan 2015-01-07
# 3: Db_servers,application Tue 2015-01-13 Jan 2015-01-13
# 4: Server1,Serve2,Server4 Wed 2015-01-14 Jan 2015-01-06,2015-01-14,2015-01-15
# 5: Server1,Serve2,Server4 Thu 2015-01-15 Jan 2015-01-06,2015-01-14,2015-01-15
# 6: Server1,Serve2,Sever4 Tue 2015-01-20 Jan 2015-01-20
# 7: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16
# 8: Server1,Serve2,Server4 Mon 2015-02-16 Feb 2015-02-16,2015-02-16