R 在一个df中按组类型求和多个变量,无需子集
我正在寻找一种按组类型进行总结的更快方法,对于一个df中的许多不同组,无需子集。下面是一个示例数据帧和我用来完成它的当前代码。我觉得这似乎很冗长,我想有一个更快的方法来解决这个问题。在本例中,我的代码汇总了按名称分组的医疗收入,然后将其合并回主数据。我想总结一下健康和愿景变量,按名称分组。关键是,当变量中有1时,我只想要健康和愿景的收入。谢谢你的帮助R 在一个df中按组类型求和多个变量,无需子集,r,R,我正在寻找一种按组类型进行总结的更快方法,对于一个df中的许多不同组,无需子集。下面是一个示例数据帧和我用来完成它的当前代码。我觉得这似乎很冗长,我想有一个更快的方法来解决这个问题。在本例中,我的代码汇总了按名称分组的医疗收入,然后将其合并回主数据。我想总结一下健康和愿景变量,按名称分组。关键是,当变量中有1时,我只想要健康和愿景的收入。谢谢你的帮助 #df name = c("jerry","jerry","jerry","dave","dave","dave","mary","mary","
#df
name = c("jerry","jerry","jerry","dave","dave","dave","mary","mary","mary")
health = c(1,0,1,1,0,1,0,1,1)
vision = c(0,1,0,0,1,0,1,0,0)
rev =c(100,200,500,1000,800,300,400,600,300)
df = data.frame(name,health,vision,rev)
#Subset health
health = subset(df, health == 1)
#Sum by group type
library(dplyr)
health <- health %>% group_by(name) %>%
mutate(
health_rev=sum(rev, na.rm = TRUE))
#Select variables
health <- health[c("name","health_rev")]
#Remove duplicates
health <- health[!duplicated(health$name), ]
#Merge back to master
master <- merge(x = df, y = health, by = "name", all.x = TRUE)
#df
name=c(“杰瑞”、“杰瑞”、“杰瑞”、“戴夫”、“戴夫”、“玛丽”、“玛丽”、“玛丽”)
健康=c(1,0,1,1,0,1,0,1,1)
视觉=c(0,1,0,0,1,0,1,0,0)
rev=c(1002005001000800300400600300)
df=数据帧(名称、健康状况、视力、版本)
#亚健康
运行状况=子集(df,运行状况==1)
#按组类型求和
图书馆(dplyr)
运行状况%group_by(名称)%%>%
变异(
健康状况(修订=总和(修订,na.rm=真实))
#选择变量
健康像这样的事情
df %>%
group_by(name) %>%
mutate(health_rev = sum(rev[as.logical(health)]),
vision_rev = sum(rev[as.logical(vision)])) %>%
ungroup()
结果:
# A tibble: 9 × 6
name health_rev vision_rev health vision rev
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 dave 1300 800 1 0 1000
2 dave 1300 800 0 1 800
3 dave 1300 800 1 0 300
4 jerry 600 200 1 0 100
5 jerry 600 200 0 1 200
6 jerry 600 200 1 0 500
7 mary 900 400 0 1 400
8 mary 900 400 1 0 600
9 mary 900 400 1 0 300
#一个tible:9×6
名称health_rev vision_rev health vision rev
1戴夫1300 800 1 0 1000
2戴夫1300 800 01 800
3戴夫1300 800 1 0 300
4杰里600 200 100
5杰里600 200 01 200
6杰里600 200 1 0 500
7玛丽900 400 01 400
8玛丽900 400 100 600
9玛丽900 400 100 300
有时将数据重塑为long会使您的数据更易于使用:库(tidyverse);df%%>%聚集(变量、变量、健康、视野)%%>%过滤(如逻辑(变量))%%>%group_by(名称、变量)%%>%汇总(rev=sum(rev))%%>%spread(变量、变量)
比我的方法优雅得多。谢谢你,克丽丝。