在多个列上应用tidyr::separate
我希望迭代数据帧中的列,并基于分隔符将它们拆分为数据帧。我使用的是在多个列上应用tidyr::separate,r,dataframe,lapply,tidyr,R,Dataframe,Lapply,Tidyr,我希望迭代数据帧中的列,并基于分隔符将它们拆分为数据帧。我使用的是tidyr::separate,当我一次只做一列时,它就起作用了 例如: df<- data.frame(a = c("5312,2020,1212"), b = c("345,982,284")) df <- separate(data = df, col = "a", into = paste("a", c("col1", "col2", "col3"),
tidyr::separate
,当我一次只做一列时,它就起作用了
例如:
df<- data.frame(a = c("5312,2020,1212"), b = c("345,982,284"))
df <- separate(data = df, col = "a",
into = paste("a", c("col1", "col2", "col3"),
sep = "_"), sep = ",")
当我尝试对df
R的每一列执行相同的操作时,返回一个错误
例如,我将此用于循环:
for(col in names(df)){
df <- separate(data = df, col = col,
into = paste(col, c("col1", "col2", "col3),
sep = "_"), sep = ",")
}
但是,R返回此错误:
Error in if (!after) c(values, x) else if (after >= lengx) c(x, values) else c(x[1L:after], :
argument is of length zero
有没有其他方法可以在数据帧中的多个列上应用
tidyr::separate
?您可以将自定义的separate
调用传入Reduce()
sep <- function(...) {
dots <- list(...)
n <- stringr::str_count(dots[[1]][[dots[[2]]]], "\\d+")
separate_(..., into = sprintf("%s_col%d", dots[[2]], 1:n))
}
df %>% Reduce(f = sep, x = c("a", "b"))
# a_col_1 a_col_2 a_col_3 b_col_1 b_col_2 b_col_3
# 1 5312 2020 1212 345 982 284
我也有同样的问题(学习tidyverse
),所以我就这样做了。注意,我想要一个不会崩溃的解决方案,所以不依赖于知道colnames
library(tidyverse)
创建您的输入:
dft <- as_tibble(data.frame(a = c("5312,2020,1212"), b = c("345,982,284")))
df <- as.data.frame(dft)
对于循环版本:
for(x in 1:dim(df)[2]){
dataCol <- dft[,x]
newCols <- paste(colnames(dataCol)[1], paste("col", 1:leng, sep="") , sep="_")
dft0 <- cbind(dft0,
separate(data = dataCol,
col = colnames(dataCol)[1],
into = newCols,
sep = ","))}
for(1中的x:dim(df)[2]){
dataColdf%%>%gather()%%>%separate_rows(value)%%>%mutate(key=paste0(key,'.'u col',1:3))%%>%spread(key,value)
,但这并不比调用两次separate
更简单……或者你可以用SEseparate(code>separate)来修复你的原始代码,例如for(name in names(df))dfsplitstackshape::cSplit
是有用的。但是,sep
函数有点混乱,它适用于给定的df
,但是如果扩展到更大和不同的数据帧,它会失败,这一点您可能已经知道。到今天为止,有没有任何tidyverse函数与splitstackshape::cSplit
?例如tidyr::cSplitBetter
library(tidyverse)
dft <- as_tibble(data.frame(a = c("5312,2020,1212"), b = c("345,982,284")))
df <- as.data.frame(dft)
dft0 <- read_csv("a\na")
dft0 <- dft0[,-1]
dft00 <- dft0
leng <- 3
for(x in 1:dim(df)[2]){
dataCol <- dft[,x]
newCols <- paste(colnames(dataCol)[1], paste("col", 1:leng, sep="") , sep="_")
dft0 <- cbind(dft0,
separate(data = dataCol,
col = colnames(dataCol)[1],
into = newCols,
sep = ","))}
sapp <- sapply(colnames(df),function(ff){
separate(as_tibble(df[,ff]),
"value",
letters[1:leng],
sep=",")})
dft00 <- as_tibble(do.call(cbind, sapp))
colnames(dft00) <- as.vector(sapply(colnames(sapp),
function(sa){
paste(sa,
rownames(sapp),
sep="_")
}))