数据分组可与grouper的熊猫分组相比较
我有分层事件的数据集,其中一个事件有一行数据分组可与grouper的熊猫分组相比较,r,dataframe,time-series,R,Dataframe,Time Series,我有分层事件的数据集,其中一个事件有一行 TIME level1 level2 Occurrence 29/11/2019 00:05 A a 1 29/11/2019 00:05 B a 1 29/11/2019 00:07 B b 1 29/11/2019 00:20 B b 1 29/11/2019 00:05 B c
TIME level1 level2 Occurrence
29/11/2019 00:05 A a 1
29/11/2019 00:05 B a 1
29/11/2019 00:07 B b 1
29/11/2019 00:20 B b 1
29/11/2019 00:05 B c 1
29/11/2019 01:20 A a 1
29/11/2019 01:25 A a 1
29/11/2019 02:00 A a 2
29/11/2019 02:00 B a 1
29/11/2019 02:00 B b 1
29/11/2019 02:35 B b 1
29/11/2019 02:49 B c 1
我将其与Pandas groupby和grouper进行聚合,得到如下输出
df_agg = df.groupby([pd.Grouper(freq='H'), 'level1', pd.Grouper('level2')])
df_agg.count()
我能在R中实现类似的目标吗
我附加了一个命令来创建类似于我工作的数据集
dict = {"TIME" : ['29/11/2019 00:05:00', '29/11/2019 00:05:00', '29/11/2019 00:07:00', '29/11/2019 00:20:00',
'29/11/2019 00:05:00', '29/11/2019 01:20:00', '29/11/2019 01:25:00', '29/11/2019 02:00:00',
'29/11/2019 02:00:00', '29/11/2019 02:00:00', '29/11/2019 02:35:00', '29/11/2019 02:49:00'],
"level1" : ["A", "B", "B", "B", "B", "A", "A", "A", "B","B", "B", "B"],
"level2" : ["a", "a", "b", "b", "c", "a", "a", "a", "a", "b", "b","c"]}
tmp_df = pd.DataFrame(dict)
tmp_df = tmp_df.set_index('TIME')
tmp_df.index = pd.to_datetime(tmp_df.index)
使用
lubridate
和dplyr
,您可以
library(dplyr)
library(lubridate)
df %>%
mutate(TIME = floor_date(dmy_hm(TIME), "hour")) %>%
count(TIME, level1, level2)
# A tibble: 9 x 4
# TIME level1 level2 n
# <dttm> <fct> <fct> <int>
#1 2019-11-29 00:00:00 A a 1
#2 2019-11-29 00:00:00 B a 1
#3 2019-11-29 00:00:00 B b 2
#4 2019-11-29 00:00:00 B c 1
#5 2019-11-29 01:00:00 A a 2
#6 2019-11-29 02:00:00 A a 1
#7 2019-11-29 02:00:00 B a 1
#8 2019-11-29 02:00:00 B b 2
#9 2019-11-29 02:00:00 B c 1
库(dplyr)
图书馆(lubridate)
df%>%
变异(时间=楼层日期(dmy\U hm(时间),“小时”))%>%
计数(时间、级别1、级别2)
#一个tibble:9x4
#时间级别1级别2 n
#
#2019-11-29 00:00:00A 1
#2 2019-11-29 00:00:00 B a 1
#2019-11-29 00:00:00 B 2
#4 2019-11-29 00:00:00 B c 1
#5 2019-11-29 01:00:00 A 2
#6 2019-11-29 02:00:00 A 1
#7 2019-11-29 02:00:00 B a 1
#8 2019-11-29 02:00:00 B 2
#9 2019-11-29 02:00:00 B c 1
我们可以使用dplyr
软件包:
库(dplyr)
dat%>%
分组依据(时间=格式(dat$TIME,格式=“%d/%m/%Y%H:00:00”),级别1,级别2)%>%
计数(name=“count”)
#>#tibble:9 x 4
#>#分组:时间、级别1、级别2[9]
#>时间级别1级别2计数
#>
#>2019年11月29日00:00:00A1
#>2019年11月29日00:00:00 B a 1
#>2019年11月29日00:00:00 B 2
#>2019年11月29日00:00:00 B c 1
#>2019年11月29日01:00:00 A 2
#>2019年11月29日02:00:00 A 1
#>2019年11月29日02:00:00 B a 1
#>2019年11月829日02:00:00 B 2
#>2019年11月9日02:00:00 B c 1
数据:这是我使用的数据。请使用dput(dat)
而不是复制/粘贴来提供数据
structure(list(TIME = structure(c(1574985900, 1574985900, 1574986020,
1574986800, 1574985900, 1574990400, 1574990700, 1574992800, 1574992800,
1574992800, 1574994900, 1574995740), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), level1 = c("A", "B", "B", "B", "B", "A", "A",
"A", "B", "B", "B", "B"), level2 = c("a", "a", "b", "b", "c",
"a", "a", "a", "a", "b", "b", "c"), Occurrence = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(TIME = structure(list(format = "%d/%m/%Y %H:%M"), class = c("collector_datetime",
"collector")), level1 = structure(list(), class = c("collector_character",
"collector")), level2 = structure(list(), class = c("collector_character",
"collector")), Occurrence = structure(list(), class = c("collector_integer",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
感谢M--,使用您提供的命令,时间的聚合不会发生。我正在检查如何附上样品dataset@Praveen这是因为您的
Time
列不是POSIXct
的类。使用as.POSIXct
或使用lubridate::dmy_hm
将其转换,这样就可以了。谢谢Ronak。使用您提供的命令,我得到了警告消息:所有格式都无法解析。找不到任何格式。时间被评估为“不”,我猜日期解析出错了。你能建议时间格式吗?@Praveen在你之前共享的数据中,你有小时和分钟,但在更新中,你有小时分钟和秒。因此,请改用dmy\U hms
。尝试df%%>%变异(时间=楼层日期(dmy\U hms(时间),“小时”)%%>%计数(时间,级别1,级别2)
@Rohan,我的错误。很抱歉。新命令正在工作并产生我期望的输出。谢谢不是python。
structure(list(TIME = structure(c(1574985900, 1574985900, 1574986020,
1574986800, 1574985900, 1574990400, 1574990700, 1574992800, 1574992800,
1574992800, 1574994900, 1574995740), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), level1 = c("A", "B", "B", "B", "B", "A", "A",
"A", "B", "B", "B", "B"), level2 = c("a", "a", "b", "b", "c",
"a", "a", "a", "a", "b", "b", "c"), Occurrence = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(TIME = structure(list(format = "%d/%m/%Y %H:%M"), class = c("collector_datetime",
"collector")), level1 = structure(list(), class = c("collector_character",
"collector")), level2 = structure(list(), class = c("collector_character",
"collector")), Occurrence = structure(list(), class = c("collector_integer",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))