如何根据R中的两列求列和
我有一个数据框(如何根据R中的两列求列和,r,dataframe,dplyr,R,Dataframe,Dplyr,我有一个数据框(df),有5列:区域.名称,年龄,总计,农村和城市。我需要根据区域获得总计的总和。名称,然后根据年龄分为两类:0-2和3-4 df <- structure(list(Area.Name = structure(c(6L, 6L, 6L, 6L, 6L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("District - Central (06)", "District - East (04)", "District - New Delhi
df
),有5列:区域.名称
,年龄
,总计
,农村
和城市
。我需要根据区域获得总计
的总和。名称
,然后根据年龄
分为两类:0-2和3-4
df <-
structure(list(Area.Name = structure(c(6L, 6L, 6L, 6L, 6L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("District - Central (06)", "District - East (04)",
"District - New Delhi (05)", "District - North (02)", "District - North East (03)",
"District - North West (01)", "District - South (09)", "District - South West (08)",
"District - West (07)", "NCT OF DELHI (07)"), class = "factor"),
Age = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L, 5L), Total = c(56131L,
58644L, 63835L, 63859L, 64945L, 24556L, 27076L, 27234L, 27604L,
27725L, 30780L), Rural = c(3589L, 3757L, 4200L, 4102L, 4223L,
52L, 56L, 61L, 47L, 67L, 53L), Urban = c(52542L, 54887L,
59635L, 59757L, 60722L, 24504L, 27020L, 27173L, 27557L, 27658L,
30727L)), .Names = c("Area.Name", "Age", "Total", "Rural",
"Urban"), row.names = c(102L, 103L, 104L, 105L, 106L, 405L, 406L,
407L, 408L, 409L, 410L), class = "data.frame")
我试过使用dplyr
软件包,但我不太熟悉这一点,所以有点困在这里:
df %>% group_by(Area.Name) %>% summarize(Age = Age[0],Tot = sum(Total))
问题是这里对于
年龄
我无法给出一个范围。这里有一个以R为基数的方法,使用cut
和aggregate
:
df$ageCat <- cut(df$Age, breaks=c(0, 2, max(df$Age)), include.lowest = T)
aggregate(Total~Area.Name+ageCat, data=df, sum)
Area.Name ageCat Total
1 District - East (04) [0,2] 78866
2 District - North West (01) [0,2] 178610
3 District - East (04) (2,5] 86109
4 District - North West (01) (2,5] 128804
df$ageCat这里有一个在R基中使用cut
和aggregate
的方法:
df$ageCat <- cut(df$Age, breaks=c(0, 2, max(df$Age)), include.lowest = T)
aggregate(Total~Area.Name+ageCat, data=df, sum)
Area.Name ageCat Total
1 District - East (04) [0,2] 78866
2 District - North West (01) [0,2] 178610
3 District - East (04) (2,5] 86109
4 District - North West (01) (2,5] 128804
df$ageCat这里有一种方法,我cut()
Age
与groupby
函数内联:
library(dplyr)
df %>%
group_by(Area.Name, Age = cut(Age, breaks = c(0, 2, 4, +Inf),
labels = c("0-2", "3-4", "4+"), include.lowest = TRUE)) %>%
summarise(Total = sum(Total))
# Area.Name Age Total
# <fctr> <fctr> <int>
# 1 District - East (04) 0-2 78866
# 2 District - East (04) 3-4 55329
# 3 District - East (04) 4+ 30780
# 4 District - North West (01) 0-2 178610
# 5 District - North West (01) 3-4 128804
库(dplyr)
df%>%
分组依据(Area.Name,Age=cut(Age,breaks=c(0,2,4,+Inf)),
标签=c(“0-2”、“3-4”、“4+”),包括。最低值=TRUE))%>%
总结(总计=总计)
#面积.姓名年龄合计
#
#1区-东(04)0-2 78866
#2区-东(04)3-4 55329
#3区-东(04)4+30780
#4区-西北(01)0-2 178610
#5区-西北(01)3-4 128804
要仅获取所需的组,您可以添加%%>%过滤器(年龄百分比在%c(“0-2”、“3-4”)
以下是一种方法,其中Icut()
Age
与group\u by
函数内联:
library(dplyr)
df %>%
group_by(Area.Name, Age = cut(Age, breaks = c(0, 2, 4, +Inf),
labels = c("0-2", "3-4", "4+"), include.lowest = TRUE)) %>%
summarise(Total = sum(Total))
# Area.Name Age Total
# <fctr> <fctr> <int>
# 1 District - East (04) 0-2 78866
# 2 District - East (04) 3-4 55329
# 3 District - East (04) 4+ 30780
# 4 District - North West (01) 0-2 178610
# 5 District - North West (01) 3-4 128804
库(dplyr)
df%>%
分组依据(Area.Name,Age=cut(Age,breaks=c(0,2,4,+Inf)),
标签=c(“0-2”、“3-4”、“4+”),包括。最低值=TRUE))%>%
总结(总计=总计)
#面积.姓名年龄合计
#
#1区-东(04)0-2 78866
#2区-东(04)3-4 55329
#3区-东(04)4+30780
#4区-西北(01)0-2 178610
#5区-西北(01)3-4 128804
要仅获取所需的组,您可以添加
%%>%filter(年龄%in%c(“0-2”、“3-4”)
我正在尝试df%>%groupby(Area.Name)%%>%summary(年龄=Age[0],Tot=sum(Total))
但在这里,对于年龄,我无法给出一个范围。我正在尝试df%>%groupby(Area.Name)%%>%summary(年龄=Age[0],Tot=sum)(总计))
但问题是,对于年龄,我无法给出一个范围。对于小组建设,您也可以使用。对于小组建设,您也可以使用。bincode
likedf$group。对于小组建设,您也可以使用。bincode
likedf$group