R 计算多个列案例的出现次数
我有一个数据帧:R 计算多个列案例的出现次数,r,dataframe,count,R,Dataframe,Count,我有一个数据帧: ID Date col1 col2 1 1606807860 LOY A 2 1606807860 LOY B 2 1606807860 LOY B 3 1606807860 LOY B 1 1606807860 LOY A 我想根据ID、日期、col1和col2统计唯一值的出现次数。因此,期望的结果是: ID Date
ID Date col1 col2
1 1606807860 LOY A
2 1606807860 LOY B
2 1606807860 LOY B
3 1606807860 LOY B
1 1606807860 LOY A
我想根据ID、日期、col1和col2统计唯一值的出现次数。因此,期望的结果是:
ID Date event count
1 1606807860 loy-a 2
2 1606807860 loy-b 2
3 1606807860 loy-b 1
我怎么能这么做?另外,如何将时间戳格式转换为标准格式,而不是1606807860?如何更改日期类型?让它像一年一月一日
在只有col1和col2的情况下进行计数:
%>%
mutate(across(c(col1, col2), tolower)) %>%
count(col1, col2) %>%
unite(event, col1, col2, sep='-')
在这种情况下,我们不是逐个指定多个列,而是在
groupby
中使用cross
,然后从名称中指定一系列列,以及摘要
library(dplyr)
library(stringr)
df1 %>%
group_by(across(names(.)[1:4])) %>%
summarise(count = n(), .groups = 'drop') %>%
mutate(event = tolower(str_c(col1, col2, sep="-"))) %>%
select(-col1, -col2)
-输出
# A tibble: 3 x 4
# ID Date count event
# <int> <int> <int> <chr>
#1 1 1606807860 2 loy-a
#2 2 1606807860 2 loy-b
#3 3 1606807860 1 loy-b
# A tibble: 3 x 4
# ID Date event count
# <int> <date> <chr> <int>
#1 1 2020-12-01 loy-a 2
#2 2 2020-12-01 loy-b 2
#3 3 2020-12-01 loy-b 1
# ID Date event n
#1 1 1606807860 loy-a 2
#2 2 1606807860 loy-b 2
#3 3 1606807860 loy-b 1
-输出
# A tibble: 3 x 4
# ID Date count event
# <int> <int> <int> <chr>
#1 1 1606807860 2 loy-a
#2 2 1606807860 2 loy-b
#3 3 1606807860 1 loy-b
# A tibble: 3 x 4
# ID Date event count
# <int> <date> <chr> <int>
#1 1 2020-12-01 loy-a 2
#2 2 2020-12-01 loy-b 2
#3 3 2020-12-01 loy-b 1
# ID Date event n
#1 1 1606807860 loy-a 2
#2 2 1606807860 loy-b 2
#3 3 1606807860 loy-b 1
-输出
# A tibble: 3 x 4
# ID Date count event
# <int> <int> <int> <chr>
#1 1 1606807860 2 loy-a
#2 2 1606807860 2 loy-b
#3 3 1606807860 1 loy-b
# A tibble: 3 x 4
# ID Date event count
# <int> <date> <chr> <int>
#1 1 2020-12-01 loy-a 2
#2 2 2020-12-01 loy-b 2
#3 3 2020-12-01 loy-b 1
# ID Date event n
#1 1 1606807860 loy-a 2
#2 2 1606807860 loy-b 2
#3 3 1606807860 loy-b 1
数据
df1试试这个:
library(dplyr)
#Code
new <- df %>% group_by(ID,Date,event=tolower(paste0(col1,'-',col2))) %>%
summarise(N=n()) %>% mutate(Date=as.Date(as.POSIXct(Date,origin = "1970-01-01")))
库(dplyr)
#代码
新的%groupby(ID,Date,event=tolower(paste0(col1',-',col2)))%>%
总结(N=N())%>%突变(日期=as.Date(as.POSIXct(Date,origin=“1970-01-01”))
输出:
# A tibble: 3 x 4
# Groups: ID, Date [3]
ID Date event N
<int> <date> <chr> <int>
1 1 2020-12-01 loy-a 2
2 2 2020-12-01 loy-b 2
3 3 2020-12-01 loy-b 1
#一个tible:3 x 4
#组:ID,日期[3]
ID日期事件N
1 2020-12-01 loy-a 2
2 2020-12-01 loy-b 2
3 2020-12-01 loy-b 1
基本R选项
aggregate(
n ~ .,
transform(
df,
event = tolower(paste(col1, col2, sep = "-")),
Date = as.Date(as.POSIXct(Date, origin = "1970-01-01")),
n = 1,
col1 = NULL,
col2 = NULL
),
sum
)
给
ID Date event n
1 1 2020-12-01 loy-a 2
2 2 2020-12-01 loy-b 2
3 3 2020-12-01 loy-b 1
ID Date event n
1: 1 2020-12-01 loy-a 2
2: 2 2020-12-01 loy-b 2
3: 3 2020-12-01 loy-b 1
A数据表
选项
setDT(df)
df[, Date := as.Date(as.POSIXct(Date, origin = "1970-01-01"))][, .(event = tolower(paste(col1, col2, sep = "-")), n = .N), by = names(df)][, c("col1", "col2") := NULL][]
给
ID Date event n
1 1 2020-12-01 loy-a 2
2 2 2020-12-01 loy-b 2
3 3 2020-12-01 loy-b 1
ID Date event n
1: 1 2020-12-01 loy-a 2
2: 2 2020-12-01 loy-b 2
3: 3 2020-12-01 loy-b 1
数据
> dput(df)
structure(list(ID = c(1L, 2L, 2L, 3L, 1L), Date = c(1606807860L,
1606807860L, 1606807860L, 1606807860L, 1606807860L), col1 = c("LOY",
"LOY", "LOY", "LOY", "LOY"), col2 = c("A", "B", "B", "B", "A"
)), class = "data.frame", row.names = c(NA, -5L))
如何更改日期类型?让它像年-月-日一样?@french\u-fries预期的日期输出是什么?@french\u-fries更新,希望有帮助!如何更改日期类型?让它像年-月-日一样?@french_-fries这个数字的预期日期是什么column@french_fries您可能需要as.Date(as.POSIXct(df1$Date,origin='1970-01-01'))
@french\u fries请查看我的更新。thanks@french_fries我先更新了帖子,如果你检查code@akrun谢谢你的提醒!我相应地调整了我的解决方案