计数“;是”;使用dplyr在数据帧中的多列中

计数“;是”;使用dplyr在数据帧中的多列中,r,dplyr,R,Dplyr,假设我有以下数据。[根据要求,我正在添加数据] col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D") col2 <- c("High", "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Med

假设我有以下数据。[根据要求,我正在添加数据]

col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D")
col2 <- c("High",   "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Medium", "Medium", "Medium")
col3 <- c("Yes", "Yes", "No", "No", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes")
col4 <- c("No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Yes")
df <- data.frame(col1, col2, col3, col4)
我想使用
dplyr
函数获得以下结果。状态_1需要是Col3中针对每个团队的“是”数,而状态_2需要是Col4中针对每个团队的“是”数

       High Medium  Low Status_1    Status_2
Team A    1      2    0        2           1
Team B    0      0    3        1           2
Team C    1      1    1        1           1
Team D    0      3    0        3           2
我可以使用下面的语句生成“Status_1”和“Status_2”的最后两列的正常摘要。有人能帮忙吗

df %>%
  group_by(Col1, Col2) %>%
  summarise(Count = n()) %>%
  spread(Col1, Count, fill = 0)

我将使用
grepl
sum
简单地计算匹配项:

df%>%
如果(is.factor,as.character)%>%,则进行mutate_#您的示例数据被作为factor排序
分组依据(col1)%>%
总结(高=总和(grepl(“高”,col2)),
中=和(grepl(“中”,col2)),
低=和(grepl(“低”,col2)),
状态_1=总和(grepl(“是”,col3)),
状态_2=总和(grepl(“是”,col4)))
#>#tibble:4 x 6
#>col1高-中-低状态\u 1状态\u 2
#>                
#>1 A队120 2 1
#>2 B队0 3 1 2
#>3 C队1
#>4 D队03 03 2
由(v0.3.0)于2019-11-30创建


除了
grepl
之外,您还可以从
stringr
使用
str\u count
str\u detect
。在这种情况下,所有人都在做同样的事情。重要的是使用
sum
,以便将计数聚合为一个值。

首先,按
col1
对数据进行分组,以计算
col3
col4
中的
Yes
数。然后再次按所有列分组,并使用
n()
计算每组的观察次数。最后,使用
tidyr::pivot_wide
将数据从长到宽进行转换

df %>%
  group_by(col1) %>%
  mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
  rename(status_1 = col3, status_2 = col4) %>% 
  group_by_all %>%
  summarise(n = n()) %>%
  tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))

# # A tibble: 4 x 6
#   col1   status_1 status_2  High Medium   Low
#   <fct>     <int>    <int> <int>  <int> <int>
# 1 Team A        2        1     1      2     0
# 2 Team B        1        2     0      0     3
# 3 Team C        1        1     1      1     1
# 4 Team D        3        2     0      3     0
df%>%
分组依据(col1)%>%
在(变量(col3:col4),~sum(.==“是”)%%>处突变
重命名(状态_1=col3,状态_2=col4)%>%
分组依据所有%>%
总结(n=n())%>%
tidyr::pivot\u更宽(名称\u from=col2,值\u from=n,值\u fill=list(n=0))
##tibble:4 x 6
#col1状态\u 1状态\u 2高-中-低
#                
#1 A队2 1 2 0
#2 B队1 2 0 3
#3 C队1
#4 D队3 2 0 3 0

如果您在输入请求数据时提供一些示例数据,例如使用
dput()
会更容易提供帮助。
df %>%
  group_by(col1) %>%
  mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
  rename(status_1 = col3, status_2 = col4) %>% 
  group_by_all %>%
  summarise(n = n()) %>%
  tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))

# # A tibble: 4 x 6
#   col1   status_1 status_2  High Medium   Low
#   <fct>     <int>    <int> <int>  <int> <int>
# 1 Team A        2        1     1      2     0
# 2 Team B        1        2     0      0     3
# 3 Team C        1        1     1      1     1
# 4 Team D        3        2     0      3     0