在r中使用dplyr高效地重塑数据帧
我有一个这样的数据帧在r中使用dplyr高效地重塑数据帧,r,dataframe,dplyr,tidyverse,R,Dataframe,Dplyr,Tidyverse,我有一个这样的数据帧 id = letters[1:5] items = c('A,B,C,D,E', 'C,D,E,A,B', 'E,D,C', 'B,A', 'A') dat = tibble(id = id, items =items) > dat # A tibble: 5 x 2 id items <chr> <chr> 1 a A,B,
id = letters[1:5]
items = c('A,B,C,D,E',
'C,D,E,A,B',
'E,D,C',
'B,A',
'A')
dat = tibble(id = id, items =items)
> dat
# A tibble: 5 x 2
id items
<chr> <chr>
1 a A,B,C,D,E
2 b C,D,E,A,B
3 c E,D,C
4 d B,A
5 e A
这是我的代码,但我认为它是多余的
我的代码中还有一个BUG:当我用mapas.data.frame替换mapas_tible时,值都是NA
有没有更有效的方法
任何帮助都将不胜感激
# get id
id = dat[,1]
# reshape items
items <- dat[,2]
# function that let the first row to colnames and then add a row that all value is 1. Finally, remove the first row
make.title <- function(data){
row.1 <- unlist(slice(data, 1))
colnames(data) <- row.1
data <- rbind(data, rep(1, ncol(data)))
data <- slice(data, -1)
data
}
# final.dat.2 is what I wanted
final.dat.2 <-
split(items, seq(nrow(items))) %>%
map(unlist) %>%
map(~str_split(., pattern = ',')) %>%
map(unlist) %>%
map(rbind) %>%
map(as_tibble) %>%
map(make.title) %>%
bind_rows() %>%
transmute(across(.cols = everything(), ~replace_na(., 0))) %>%
bind_cols(id)
# bug occur
final.dat.3 <-
split(items, seq(nrow(items))) %>%
map(unlist) %>%
map(~str_split(., pattern = ',')) %>%
map(unlist) %>%
map(rbind) %>%
map(as.data.frame) %>% # as dataframe
map(make.title) %>%
bind_rows() %>%
transmute(across(.cols = everything(), ~replace_na(., 0))) %>%
bind_cols(id)
试试这个。您可以从tidyverse中使用单独的行和轴,以达到预期的输出:
library(dplyr)
library(tidyr)
#Code
newdf <- dat %>% separate_rows(items,sep=',') %>%
mutate(Val=1) %>%
pivot_wider(names_from = items,values_from=Val,values_fill=0)
输出:
# A tibble: 5 x 6
id A B C D E
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 1 1 1 1 1
2 b 1 1 1 1 1
3 c 0 0 1 1 1
4 d 1 1 0 0 0
5 e 1 0 0 0 0
试试这个。您可以从tidyverse中使用单独的行和轴,以达到预期的输出:
library(dplyr)
library(tidyr)
#Code
newdf <- dat %>% separate_rows(items,sep=',') %>%
mutate(Val=1) %>%
pivot_wider(names_from = items,values_from=Val,values_fill=0)
输出:
# A tibble: 5 x 6
id A B C D E
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 1 1 1 1 1
2 b 1 1 1 1 1
3 c 0 0 1 1 1
4 d 1 1 0 0 0
5 e 1 0 0 0 0
我们也可以这样做
library(dplyr)
library(tidyr)
df %>%
mutate(items = strsplit(items, ",")) %>%
unnest(c(items)) %>%
mutate(Val = 1) %>%
pivot_wider(names_from = items, values_from = Val, values_fill = 0)
或者使用mtabulate的选项
我们也可以这样做
library(dplyr)
library(tidyr)
df %>%
mutate(items = strsplit(items, ",")) %>%
unnest(c(items)) %>%
mutate(Val = 1) %>%
pivot_wider(names_from = items, values_from = Val, values_fill = 0)
或者使用mtabulate的选项
对于像这样的任务,它有很大的帮助。正如@markus提到的,您可以专门使用splitstackshape包。类似于dat%>%cSplit\u eitems、、type=character、fill=0、drop=TRUE%>%tibble的东西应该可以工作。这对类似这样的任务有很大帮助。正如@markus提到的,您可以专门使用splitstackshape包cSplit\u e。像dat%>%cSplit\u eitems、type=character、fill=0、drop=TRUE%>%tible这样的东西应该可以工作。