R 统计ABC的XXXX后查询日期的所有条目必须按其用途、分组比()划分,并详细说明数据
我打算列出每个ID中查询ABC后的所有用途(其中XXXX表示其他公司)。 示例表如下所示:R 统计ABC的XXXX后查询日期的所有条目必须按其用途、分组比()划分,并详细说明数据,r,R,我打算列出每个ID中查询ABC后的所有用途(其中XXXX表示其他公司)。 示例表如下所示: ID Company INQUIRY-DATE Purpose A15217177635833 XXXX 25-08-2018 X A15217177635833 ABC 28-06-2018 Y A15217177635833 XXXX 05-05-2018 Z A15217177635833 XXXX
ID Company INQUIRY-DATE Purpose
A15217177635833 XXXX 25-08-2018 X
A15217177635833 ABC 28-06-2018 Y
A15217177635833 XXXX 05-05-2018 Z
A15217177635833 XXXX 28-05-2019 A
F15039820795577 ABC 22-08-2017 X
F15039820795577 XXXX 15-06-2017 Y
F15039820795577 XXXX 15-08-2018 Z
F15039820795577 XXXX 25-08-2018 Z
F15039820795577 XXXX 15-08-2018 A
预期产出:
ID Count_Z Count_A
A15217177635833 1 1
F15039820795577 2 1
这意味着ABC的XXXX查询后日期的所有条目必须根据其用途进行划分。
我尝试了使用group by和mutate(count_z),但没有成功
我不知道如何在分组依据后获得详细数据,因为我的知识分组依据与摘要一起使用。我们首先将
查询日期
转换为日期对象,按ID
排列
数据,查询日期
对于每个分组,只选择第一次出现“ABC”
之后的行,对每个目的进行计数
,然后以广泛的格式传播数据
library(dplyr)
df %>%
mutate(`INQUIRY-DATE` = as.Date(`INQUIRY-DATE`, "%d-%m-%Y")) %>%
arrange(ID, `INQUIRY-DATE`) %>%
group_by(ID) %>%
filter(Company != "ABC" & row_number() > match("ABC", Company)) %>%
count(ID, Purpose) %>%
tidyr::pivot_wider(names_from = Purpose, values_from = n,
values_fill = list(n = 0))
# ID A X Z
# <fct> <int> <int> <int>
#1 A15217177635833 1 1 0
#2 F15039820795577 1 0 2
库(dplyr)
df%>%
变异(`INQUIRY-DATE`=as.DATE(`INQUIRY-DATE`,%d-%m-%Y))%>%
安排(ID,`INQUIRY-DATE`)%>%
分组依据(ID)%>%
过滤器(公司!=“ABC”&行号()>匹配(“ABC”,公司))%>%
计数(ID,用途)%>%
tidyr::pivot\u wide(名称\u from=Purpose,值\u from=n,
值\u填充=列表(n=0))
#ID A X Z
#
#1 A15217177635833 1 1 0
#2 F15039820795577 1 0 2
数据
df <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), .Label = c("A15217177635833", "F15039820795577"), class = "factor"),
Company = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("ABC",
"XXXX"), class = "factor"), `INQUIRY-DATE` = structure(c(5L,
7L, 1L, 6L, 4L, 2L, 3L, 5L, 3L), .Label = c("05-05-2018",
"15-06-2017", "15-08-2018", "22-08-2017", "25-08-2018", "28-05-2019",
"28-06-2018"), class = "factor"), Purpose = structure(c(2L,
3L, 4L, 1L, 2L, 3L, 4L, 4L, 1L), .Label = c("A", "X", "Y",
"Z"), class = "factor")), class = "data.frame", row.names = c(NA, -9L))
df还有另一种方法。它假定行是按时间顺序排列的
library(tidyr)
xy <- read.table(text = " ID Company INQUIRY-DATE Purpose
A15217177635833 XXXX 25-08-2018 X
A15217177635833 ABC 28-06-2018 Y
A15217177635833 XXXX 05-05-2018 Z
A15217177635833 XXXX 28-05-2019 A
F15039820795577 ABC 22-08-2017 X
F15039820795577 XXXX 15-06-2017 Y
F15039820795577 XXXX 15-08-2018 Z
F15039820795577 XXXX 25-08-2018 Z
F15039820795577 XXXX 15-08-2018 A", header = TRUE)
xys <- split(xy, f = xy$ID)
xya <- sapply(xys, FUN = function(x) {
# This assumes there can be more than one ABC, so start from the first one.
start <- min(which(x$Company == "ABC"))
post.abc <- x[(start + 1):nrow(x), ]
data.frame(ID = unique(x$ID), counts = table(post.abc$Purpose))
}, simplify = FALSE)
out <- do.call(rbind, xya)
rownames(out) <- NULL
spread(out, key = counts.Var1, value = counts.Freq)
ID A X Y Z
1 A15217177635833 1 0 0 1
2 F15039820795577 1 0 1 2
library(tidyr)
xy