数据分组可与grouper的熊猫分组相比较

数据分组可与grouper的熊猫分组相比较,r,dataframe,time-series,R,Dataframe,Time Series,我有分层事件的数据集,其中一个事件有一行 TIME level1 level2 Occurrence 29/11/2019 00:05 A a 1 29/11/2019 00:05 B a 1 29/11/2019 00:07 B b 1 29/11/2019 00:20 B b 1 29/11/2019 00:05 B c

我有分层事件的数据集,其中一个事件有一行

TIME               level1   level2  Occurrence
29/11/2019 00:05    A       a       1
29/11/2019 00:05    B       a       1
29/11/2019 00:07    B       b       1
29/11/2019 00:20    B       b       1
29/11/2019 00:05    B       c       1
29/11/2019 01:20    A       a       1
29/11/2019 01:25    A       a       1
29/11/2019 02:00    A       a       2
29/11/2019 02:00    B       a       1
29/11/2019 02:00    B       b       1
29/11/2019 02:35    B       b       1
29/11/2019 02:49    B       c       1
我将其与Pandas groupby和grouper进行聚合,得到如下输出

df_agg = df.groupby([pd.Grouper(freq='H'), 'level1', pd.Grouper('level2')])
df_agg.count()
我能在R中实现类似的目标吗

我附加了一个命令来创建类似于我工作的数据集

dict = {"TIME" : ['29/11/2019  00:05:00', '29/11/2019  00:05:00', '29/11/2019  00:07:00', '29/11/2019  00:20:00',
                 '29/11/2019  00:05:00', '29/11/2019  01:20:00', '29/11/2019  01:25:00', '29/11/2019  02:00:00',
                 '29/11/2019  02:00:00', '29/11/2019  02:00:00', '29/11/2019  02:35:00', '29/11/2019  02:49:00'],
        "level1" : ["A", "B", "B", "B", "B", "A", "A", "A", "B","B", "B", "B"],
        "level2" : ["a", "a", "b", "b", "c", "a", "a", "a", "a", "b", "b","c"]}

tmp_df = pd.DataFrame(dict)
tmp_df = tmp_df.set_index('TIME')
tmp_df.index = pd.to_datetime(tmp_df.index)

使用
lubridate
dplyr
,您可以

library(dplyr)
library(lubridate)
df %>%
  mutate(TIME = floor_date(dmy_hm(TIME), "hour")) %>%
  count(TIME, level1, level2)

# A tibble: 9 x 4
#  TIME                level1 level2     n
#  <dttm>              <fct>  <fct>  <int>
#1 2019-11-29 00:00:00 A      a          1
#2 2019-11-29 00:00:00 B      a          1
#3 2019-11-29 00:00:00 B      b          2
#4 2019-11-29 00:00:00 B      c          1
#5 2019-11-29 01:00:00 A      a          2
#6 2019-11-29 02:00:00 A      a          1
#7 2019-11-29 02:00:00 B      a          1
#8 2019-11-29 02:00:00 B      b          2
#9 2019-11-29 02:00:00 B      c          1
库(dplyr)
图书馆(lubridate)
df%>%
变异(时间=楼层日期(dmy\U hm(时间),“小时”))%>%
计数(时间、级别1、级别2)
#一个tibble:9x4
#时间级别1级别2 n
#                    
#2019-11-29 00:00:00A 1
#2 2019-11-29 00:00:00 B a 1
#2019-11-29 00:00:00 B 2
#4 2019-11-29 00:00:00 B c 1
#5 2019-11-29 01:00:00 A 2
#6 2019-11-29 02:00:00 A 1
#7 2019-11-29 02:00:00 B a 1
#8 2019-11-29 02:00:00 B 2
#9 2019-11-29 02:00:00 B c 1

我们可以使用
dplyr
软件包:

库(dplyr)
dat%>%
分组依据(时间=格式(dat$TIME,格式=“%d/%m/%Y%H:00:00”),级别1,级别2)%>%
计数(name=“count”)
#>#tibble:9 x 4
#>#分组:时间、级别1、级别2[9]
#>时间级别1级别2计数
#>                      
#>2019年11月29日00:00:00A1
#>2019年11月29日00:00:00 B a 1
#>2019年11月29日00:00:00 B 2
#>2019年11月29日00:00:00 B c 1
#>2019年11月29日01:00:00 A 2
#>2019年11月29日02:00:00 A 1
#>2019年11月29日02:00:00 B a 1
#>2019年11月829日02:00:00 B 2
#>2019年11月9日02:00:00 B c 1
数据:这是我使用的数据。请使用
dput(dat)
而不是复制/粘贴来提供数据

structure(list(TIME = structure(c(1574985900, 1574985900, 1574986020, 
1574986800, 1574985900, 1574990400, 1574990700, 1574992800, 1574992800, 
1574992800, 1574994900, 1574995740), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), level1 = c("A", "B", "B", "B", "B", "A", "A", 
"A", "B", "B", "B", "B"), level2 = c("a", "a", "b", "b", "c", 
"a", "a", "a", "a", "b", "b", "c"), Occurrence = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
    cols = list(TIME = structure(list(format = "%d/%m/%Y %H:%M"), class = c("collector_datetime", 
    "collector")), level1 = structure(list(), class = c("collector_character", 
    "collector")), level2 = structure(list(), class = c("collector_character", 
    "collector")), Occurrence = structure(list(), class = c("collector_integer", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

感谢M--,使用您提供的命令,时间的聚合不会发生。我正在检查如何附上样品dataset@Praveen这是因为您的
Time
列不是
POSIXct
的类。使用
as.POSIXct
或使用
lubridate::dmy_hm
将其转换,这样就可以了。谢谢Ronak。使用您提供的命令,我得到了警告消息:所有格式都无法解析。找不到任何格式。时间被评估为“不”,我猜日期解析出错了。你能建议时间格式吗?@Praveen在你之前共享的数据中,你有小时和分钟,但在更新中,你有小时分钟和秒。因此,请改用
dmy\U hms
。尝试
df%%>%变异(时间=楼层日期(dmy\U hms(时间),“小时”)%%>%计数(时间,级别1,级别2)
@Rohan,我的错误。很抱歉。新命令正在工作并产生我期望的输出。谢谢不是python。
structure(list(TIME = structure(c(1574985900, 1574985900, 1574986020, 
1574986800, 1574985900, 1574990400, 1574990700, 1574992800, 1574992800, 
1574992800, 1574994900, 1574995740), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), level1 = c("A", "B", "B", "B", "B", "A", "A", 
"A", "B", "B", "B", "B"), level2 = c("a", "a", "b", "b", "c", 
"a", "a", "a", "a", "b", "b", "c"), Occurrence = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
    cols = list(TIME = structure(list(format = "%d/%m/%Y %H:%M"), class = c("collector_datetime", 
    "collector")), level1 = structure(list(), class = c("collector_character", 
    "collector")), level2 = structure(list(), class = c("collector_character", 
    "collector")), Occurrence = structure(list(), class = c("collector_integer", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))