如何使用R(最好是dplyr或data.table)一次创建多个列?
我想根据数据框现有列中的值创建多个新变量 以下是我的数据的简化版本:如何使用R(最好是dplyr或data.table)一次创建多个列?,r,dplyr,data.table,R,Dplyr,Data.table,我想根据数据框现有列中的值创建多个新变量 以下是我的数据的简化版本: df <- structure(list(City = structure(c(5L, 4L, 4L, 3L, 1L, 2L), .Label = c("Chico", "Lawndale", "Los Angeles", "San Francisco", "San Jose"), class = "factor"), yq = c("20071", "20111", "20074", "20124", "2
df <- structure(list(City = structure(c(5L, 4L, 4L, 3L, 1L, 2L), .Label = c("Chico",
"Lawndale", "Los Angeles", "San Francisco", "San Jose"), class = "factor"),
yq = c("20071", "20111", "20074", "20124", "20111", "20124"
), cyq_total = c(15582L, 33668L, 40848L, 89028L, 1069L, 178L
)), row.names = c(NA, -6L), class = "data.frame")
变量cyq_total表示一个城市在一个季度内的职位空缺数量(yq)。我想创建名为“Vac20071”、“Vac20111”等新变量,其中值为给定城市在给定年份和季度的cyq_总数
对于我的示例,这是简化的,但本质上我希望Vac20071列显示2007年第四季度每个城市的空缺数量。其他年度季度也是如此
期望输出:
City yq cyq_total Vac20071 Vac20111 Vac20074 Vac20124
<fct> <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 San Jose 20071 15582 15582 0 0 0
2 San Francisco 20111 33668 0 33668 40848 0
3 San Francisco 20074 40848 0 33668 40848 0
4 Los Angeles 20124 89028 0 0 0 89028
5 Chico 20111 1069 0 1069 0 0
6 Lawndale 20124 178 0 0 0 178
城市yq cyq_总计Vac20071 Vac20111 Vac20074 Vac20124
1圣何塞2007115582155820
2旧金山20111 33668 33668 0 33668 40848 0
3旧金山20074 40848 40848 0 33668 40848 0
4洛杉矶2012489028 0 0 89028
5 Chico 20111 1069 0 1069 0 0
劳恩代尔6号201241780178
我必须这样做的代码可以工作,但效率不高。我正在寻找一种更好的方法来生成相同的结果,而不是复制/粘贴相同的代码,只需稍作更改:
df <- df %>% group_by(City) %>% mutate(Vac20071 = max(ifelse(yq == '20071', cyq_total, 0)))
df <- df %>% group_by(City) %>% mutate(Vac20111 = max(ifelse(yq == '20111', cyq_total, 0)))
df <- df %>% group_by(City) %>% mutate(Vac20074 = max(ifelse(yq == '20074', cyq_total, 0)))
df <- df %>% group_by(City) %>% mutate(Vac20124 = max(ifelse(yq == '20124', cyq_total, 0)))
df <- df %>% group_by(City) %>% mutate(Vac20111 = max(ifelse(yq == '20111', cyq_total, 0)))
df%group\u by(City)%%>%mutate(Vac20071=max(ifelse(yq=='20071',cyq\u总计,0)))
df%按(城市)分组%>%突变(Vac20111=max(如果其他(yq=='20111',cyq\u总数,0)))
df%按(城市)分组%>%变异(Vac20074=max(如果其他(yq=='20074',cyq_总数,0)))
df%按(城市)分组%>%突变(Vac20124=最大值(如果其他(yq=='20124',cyq\U总数,0)))
df%按(城市)分组%>%突变(Vac20111=max(如果其他(yq=='20111',cyq\u总数,0)))
使用带有矩阵数字索引的数据表的选项:
cols <- paste0("Vac", unique(df$yq))
setDT(df)[, (cols) := 0L]
df[, (cols) := {
m <- as.matrix(.SD)
ix <- match(paste0("Vac", yq), cols)
m[cbind(rep(1L:.N, each=length(ix)), rep(ix, .N))] <- cyq_total
as.data.table(m)
}, City, .SDcols=cols]
df
cols使用data.table
和矩阵数字索引的选项:
cols <- paste0("Vac", unique(df$yq))
setDT(df)[, (cols) := 0L]
df[, (cols) := {
m <- as.matrix(.SD)
ix <- match(paste0("Vac", yq), cols)
m[cbind(rep(1L:.N, each=length(ix)), rep(ix, .N))] <- cyq_total
as.data.table(m)
}, City, .SDcols=cols]
df
cols您可以获取宽格式的数据,然后加入
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = yq, values_from = cyq_total, names_prefix = 'Vac') %>%
left_join(df, by = 'City')
# A tibble: 6 x 7
# City Vac20071 Vac20111 Vac20074 Vac20124 yq cyq_total
# <fct> <int> <int> <int> <int> <chr> <int>
#1 San Jose 15582 NA NA NA 20071 15582
#2 San Francisco NA 33668 40848 NA 20111 33668
#3 San Francisco NA 33668 40848 NA 20074 40848
#4 Los Angeles NA NA NA 89028 20124 89028
#5 Chico NA 1069 NA NA 20111 1069
#6 Lawndale NA NA NA 178 20124 178
您可以获取宽格式的数据,然后加入
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = yq, values_from = cyq_total, names_prefix = 'Vac') %>%
left_join(df, by = 'City')
# A tibble: 6 x 7
# City Vac20071 Vac20111 Vac20074 Vac20124 yq cyq_total
# <fct> <int> <int> <int> <int> <chr> <int>
#1 San Jose 15582 NA NA NA 20071 15582
#2 San Francisco NA 33668 40848 NA 20111 33668
#3 San Francisco NA 33668 40848 NA 20074 40848
#4 Los Angeles NA NA NA 89028 20124 89028
#5 Chico NA 1069 NA NA 20111 1069
#6 Lawndale NA NA NA 178 20124 178