计数“;是”;使用dplyr在数据帧中的多列中
假设我有以下数据。[根据要求,我正在添加数据]计数“;是”;使用dplyr在数据帧中的多列中,r,dplyr,R,Dplyr,假设我有以下数据。[根据要求,我正在添加数据] col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D") col2 <- c("High", "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Med
col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D")
col2 <- c("High", "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Medium", "Medium", "Medium")
col3 <- c("Yes", "Yes", "No", "No", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes")
col4 <- c("No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Yes")
df <- data.frame(col1, col2, col3, col4)
我想使用dplyr
函数获得以下结果。状态_1需要是Col3中针对每个团队的“是”数,而状态_2需要是Col4中针对每个团队的“是”数
High Medium Low Status_1 Status_2
Team A 1 2 0 2 1
Team B 0 0 3 1 2
Team C 1 1 1 1 1
Team D 0 3 0 3 2
我可以使用下面的语句生成“Status_1”和“Status_2”的最后两列的正常摘要。有人能帮忙吗
df %>%
group_by(Col1, Col2) %>%
summarise(Count = n()) %>%
spread(Col1, Count, fill = 0)
我将使用
grepl
和sum
简单地计算匹配项:
df%>%
如果(is.factor,as.character)%>%,则进行mutate_#您的示例数据被作为factor排序
分组依据(col1)%>%
总结(高=总和(grepl(“高”,col2)),
中=和(grepl(“中”,col2)),
低=和(grepl(“低”,col2)),
状态_1=总和(grepl(“是”,col3)),
状态_2=总和(grepl(“是”,col4)))
#>#tibble:4 x 6
#>col1高-中-低状态\u 1状态\u 2
#>
#>1 A队120 2 1
#>2 B队0 3 1 2
#>3 C队1
#>4 D队03 03 2
由(v0.3.0)于2019-11-30创建
除了
grepl
之外,您还可以从stringr
使用str\u count
或str\u detect
。在这种情况下,所有人都在做同样的事情。重要的是使用sum
,以便将计数聚合为一个值。首先,按col1
对数据进行分组,以计算col3
和col4
中的Yes
数。然后再次按所有列分组,并使用n()
计算每组的观察次数。最后,使用tidyr::pivot_wide
将数据从长到宽进行转换
df %>%
group_by(col1) %>%
mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
rename(status_1 = col3, status_2 = col4) %>%
group_by_all %>%
summarise(n = n()) %>%
tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))
# # A tibble: 4 x 6
# col1 status_1 status_2 High Medium Low
# <fct> <int> <int> <int> <int> <int>
# 1 Team A 2 1 1 2 0
# 2 Team B 1 2 0 0 3
# 3 Team C 1 1 1 1 1
# 4 Team D 3 2 0 3 0
df%>%
分组依据(col1)%>%
在(变量(col3:col4),~sum(.==“是”)%%>处突变
重命名(状态_1=col3,状态_2=col4)%>%
分组依据所有%>%
总结(n=n())%>%
tidyr::pivot\u更宽(名称\u from=col2,值\u from=n,值\u fill=list(n=0))
##tibble:4 x 6
#col1状态\u 1状态\u 2高-中-低
#
#1 A队2 1 2 0
#2 B队1 2 0 3
#3 C队1
#4 D队3 2 0 3 0
如果您在输入请求数据时提供一些示例数据,例如使用dput()
会更容易提供帮助。
df %>%
group_by(col1) %>%
mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
rename(status_1 = col3, status_2 = col4) %>%
group_by_all %>%
summarise(n = n()) %>%
tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))
# # A tibble: 4 x 6
# col1 status_1 status_2 High Medium Low
# <fct> <int> <int> <int> <int> <int>
# 1 Team A 2 1 1 2 0
# 2 Team B 1 2 0 0 3
# 3 Team C 1 1 1 1 1
# 4 Team D 3 2 0 3 0