如何使用dplyr聚合数据帧的多个列
具有列id、列类别、列成本和列颜色的数据帧 这是数据帧df如何使用dplyr聚合数据帧的多个列,r,R,具有列id、列类别、列成本和列颜色的数据帧 这是数据帧df library(dplyr) id <- c(1, 1, 1, 2, 2, 3, 1) category <- (c("V", "V", "V", "W", "W", "W", "W")) cost <- c(10, 15, 5, 2, 14, 20, 3) colour <- c("red", "green", "red", "green", "blue","blue","blue") df <-
library(dplyr)
id <- c(1, 1, 1, 2, 2, 3, 1)
category <- (c("V", "V", "V", "W", "W", "W", "W"))
cost <- c(10, 15, 5, 2, 14, 20, 3)
colour <- c("red", "green", "red", "green", "blue","blue","blue")
df <- data.frame(id, category, cost, colour)
df$category <- as.character(df$category)
df
id category cost colour
1 V 10 red
1 V 15 green
1 V 5 red
2 W 2 green
2 W 14 blue
3 W 20 blue
1 W 3 blue
我希望有一个新的数据帧df_new,每个id的频率(freq)、条目相等的类别条目数W(category_W)、条目相等的类别条目数V(category_V)、条目相等的每个id的总成本W(cost_W),类别条目为V(成本V)的每个id的总成本,以及每个唯一id的每个颜色条目的编号(颜色红色、颜色绿色、颜色蓝色)。
输出应该如下所示
id freq category_W category_V cost_W cost_V col_red col_green col_blue
1 4 1 3 3 30 2 1 1
2 2 2 16 1 1
3 1 1 20 1
我尝试了以下方法,但不起作用
df_new <- group_by(df, id) %>% summarize(freq = count(id), category_W = count(category == "W", na.rm=TRUE), category_V = count(category == "V", na.rm=TRUE), col_red = count(colour == "red", na.rm=TRUE), col_green = count(colour == "green", na.rm=TRUE), col_blue = count(colour == "blue", na.rm=TRUE))
df_new%汇总(freq=count(id),category=W=count(category==“W”,na.rm=TRUE),category=count(category==“V”,na.rm=TRUE),col_red=count(color==“red”,na.rm=TRUE),col_green=count(color==“green”,na.rm=TRUE),col=count(color==“blue”,na.rm=TRUE))
我不知道如何插入cost_W和cost_V的条件。
我得到错误:length(rows)==1不是真的
提前多谢 好吧,你就快到了 您可以利用逻辑值在算术运算中转换为0和1这一事实。因此,当您对它们求和时,您将得到逻辑子句测试的特定值的计数 您可以使用相同的属性来计算成本。只需将逻辑子句与成本变量相乘。如果类别与您的兴趣相匹配,则对其求和,否则,它将减少为0
df_new <-
group_by(df, id) %>% summarize(
freq = n(),
category_W = sum(category == "W", na.rm = TRUE),
category_V = sum(category == "V", na.rm = TRUE),
cost_W = sum((category == "W") * cost, na.rm = TRUE),
cost_V = sum((category == "V") * cost, na.rm = TRUE),
col_red = sum(colour == "red", na.rm = TRUE),
col_green = sum(colour == "green", na.rm = TRUE),
col_blue = sum(colour == "blue", na.rm = TRUE)
)
df_新建%summary(
freq=n(),
类别W=总和(类别=“W”,na.rm=真),
类别=总和(类别=“V”,na.rm=真),
成本W=总和((类别=“W”)*成本,na.rm=真),
成本=总和((类别=“V”)*成本,na.rm=真),
颜色红色=和(颜色=“红色”,na.rm=真),
颜色绿色=和(颜色=“绿色”,na.rm=真),
col_blue=和(颜色=“蓝色”,na.rm=真)
)
好吧,你就快到了
您可以利用逻辑值在算术运算中转换为0和1这一事实。因此,当您对它们求和时,您将得到逻辑子句测试的特定值的计数
您可以使用相同的属性来计算成本。只需将逻辑子句与成本变量相乘。如果类别与您的兴趣相匹配,则对其求和,否则,它将减少为0
df_new <-
group_by(df, id) %>% summarize(
freq = n(),
category_W = sum(category == "W", na.rm = TRUE),
category_V = sum(category == "V", na.rm = TRUE),
cost_W = sum((category == "W") * cost, na.rm = TRUE),
cost_V = sum((category == "V") * cost, na.rm = TRUE),
col_red = sum(colour == "red", na.rm = TRUE),
col_green = sum(colour == "green", na.rm = TRUE),
col_blue = sum(colour == "blue", na.rm = TRUE)
)
df_新建%summary(
freq=n(),
类别W=总和(类别=“W”,na.rm=真),
类别=总和(类别=“V”,na.rm=真),
成本W=总和((类别=“W”)*成本,na.rm=真),
成本=总和((类别=“V”)*成本,na.rm=真),
颜色红色=和(颜色=“红色”,na.rm=真),
颜色绿色=和(颜色=“绿色”,na.rm=真),
col_blue=和(颜色=“蓝色”,na.rm=真)
)
数据帧中的频率是多少?你是说成本吗?对不起,你当然是对的,我把它改成了成本。我想你只需要在代码中加上这个:total_cost_W=sum(cost_W),如果我没看错你的帖子……你的数据框中的频率是多少?你的意思是成本吗?对不起,你当然是对的,我把它改成了成本。我想你只需要在代码中加上这个:total_cost_W=sum(cost_W),如果我理解你的帖子的话……齐林斯基,非常感谢你的帮助和你的解释!齐林斯基,非常感谢你的帮助和解释!