Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 拆分变量并插入NA';在两者之间_R_Variables_Split_Dataframe - Fatal编程技术网

R 拆分变量并插入NA';在两者之间

R 拆分变量并插入NA';在两者之间,r,variables,split,dataframe,R,Variables,Split,Dataframe,我有一个变量如下所示: Var [1] 3, 4, 5 2, 4, 5 2, 4 1, 4, 5 V1 V2 V3 V4 V5 NA NA 3 4 5 NA 2 NA 4 5 NA 2 NA 4 NA 1 NA NA 4 5 我需要将其拆分为如下所示的数据帧: Var [1] 3, 4, 5 2, 4, 5 2, 4 1, 4, 5 V1 V2

我有一个变量如下所示:

Var
[1] 3, 4, 5     2, 4, 5     2, 4     1, 4, 5
V1   V2   V3   V4   V5
NA   NA   3    4    5
NA   2    NA   4    5
NA   2    NA   4    NA
1    NA   NA   4    5
我需要将其拆分为如下所示的数据帧:

Var
[1] 3, 4, 5     2, 4, 5     2, 4     1, 4, 5
V1   V2   V3   V4   V5
NA   NA   3    4    5
NA   2    NA   4    5
NA   2    NA   4    NA
1    NA   NA   4    5
不幸的是,我找不到一个能解决我问题的帖子。有人知道我是怎么做到的吗? 提前非常感谢

编辑:我根据您的答案找到了一个解决方案,并将其发布在下面

Edit2:我使用Ananda的解决方案提高了代码的效率。

使用矩阵索引:

Var <- list(c(3,4,5),c(2,4,5),c(2,4),c(1,4,5))
unVar <- unlist(Var)
out <- matrix(NA, nrow=length(Var), ncol=max(unVar))

out[cbind(rep(seq_along(Var),sapply(Var,length)),unVar)] <- unVar
# and if you're using the new version of R, you can simplify a little:
out[cbind(rep(seq_along(Var),lengths(Var)),unVar)] <- unVar

#     [,1] [,2] [,3] [,4] [,5]
#[1,]   NA   NA    3    4    5
#[2,]   NA    2   NA    4    5
#[3,]   NA    2   NA    4   NA
#[4,]    1   NA   NA    4    5
Var使用矩阵索引:

Var <- list(c(3,4,5),c(2,4,5),c(2,4),c(1,4,5))
unVar <- unlist(Var)
out <- matrix(NA, nrow=length(Var), ncol=max(unVar))

out[cbind(rep(seq_along(Var),sapply(Var,length)),unVar)] <- unVar
# and if you're using the new version of R, you can simplify a little:
out[cbind(rep(seq_along(Var),lengths(Var)),unVar)] <- unVar

#     [,1] [,2] [,3] [,4] [,5]
#[1,]   NA   NA    3    4    5
#[2,]   NA    2   NA    4    5
#[3,]   NA    2   NA    4   NA
#[4,]    1   NA   NA    4    5

Var如果Var只是一个向量,那么我将执行以下操作:

Var = c(3,4,5,2,4,5,2,4,1,4,5)
RowIdx = c(rep(1,3),rep(2,3),rep(3,2),rep(4,3))
DF = matrix(NA,nrow=4,ncol=5)

for (idx in 1:length(Var)){
  DF[RowIdx[idx],Var[idx]] = Var[idx]
}

当然,如果您有更多的数据,您可能希望找到一种方法,以更自动化的方式生成行索引

如果Var只是一个向量,那么我将执行以下操作:

Var = c(3,4,5,2,4,5,2,4,1,4,5)
RowIdx = c(rep(1,3),rep(2,3),rep(3,2),rep(4,3))
DF = matrix(NA,nrow=4,ncol=5)

for (idx in 1:length(Var)){
  DF[RowIdx[idx],Var[idx]] = Var[idx]
}
 Var <- list(c(3, 4, 5), c(2, 4, 5), c(2, 4), c(1, 4, 5))
 M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max)))
 for( L in seq(Var) ) { M [ cbind( rep( L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]}
 M
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA    3    4    5
[2,]   NA    2   NA    4    5
[3,]   NA    2   NA    4   NA
[4,]    1   NA   NA    4    5

当然,如果您有更多的数据,您可能希望找到一种方法,以更自动化的方式生成行索引

Var
Var我根据您的回答找到了一个解决方案!我的最终解决方案如下所示:

 Var <- list(c(3, 4, 5), c(2, 4, 5), c(2, 4), c(1, 4, 5))
 M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max)))
 for( L in seq(Var) ) { M [ cbind( rep( L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]}
 M
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA    3    4    5
[2,]   NA    2   NA    4    5
[3,]   NA    2   NA    4   NA
[4,]    1   NA   NA    4    5
# I had the additional problem that my variable was a factor, therefore I had to transform it first.
df <- data.frame(Var)
Var <- lapply(strsplit(as.character(df$Var), ", "), "[")
for(i in 1:length(Var)){
  Var[[i]] <- as.numeric(Var[[i]]) 
}

# Then I created a matrix based on thelatemails and BondedDusts approach.
M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max)))

# Additionally, I had the problem that there were some lines with a single -99, which indicates a missing value for the complete line. I had some problems with this negative value. For this reason, I assigned NA's first.
for(i in 1:length(Var)){
  Var[[i]][Var[[i]] == -99] <- NA
}

# Final assignment like suggested by BonedDust.
for( L in seq(Var) ) { M [ cbind( rep( L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]}
M
#我还有一个问题,我的变量是一个因子,因此我必须首先转换它。

df我根据您的回答找到了解决方案!我的最终解决方案如下所示:

# I had the additional problem that my variable was a factor, therefore I had to transform it first.
df <- data.frame(Var)
Var <- lapply(strsplit(as.character(df$Var), ", "), "[")
for(i in 1:length(Var)){
  Var[[i]] <- as.numeric(Var[[i]]) 
}

# Then I created a matrix based on thelatemails and BondedDusts approach.
M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max)))

# Additionally, I had the problem that there were some lines with a single -99, which indicates a missing value for the complete line. I had some problems with this negative value. For this reason, I assigned NA's first.
for(i in 1:length(Var)){
  Var[[i]][Var[[i]] == -99] <- NA
}

# Final assignment like suggested by BonedDust.
for( L in seq(Var) ) { M [ cbind( rep( L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]}
M
#我还有一个问题,我的变量是一个因子,因此我必须首先转换它。

df根据OPs答案判断,“var”是一个字符串,如:

var <- c("3, 4, 5", "2, 4, 5", "2, 4", "1, 4, 5")
如果它是一个
列表
,正如其他答案所假设的那样,您可以使用“splitstackshape”中的(非导出的)
numMat
功能,为
cSplit\u e
供电

var <- list(c(3,4,5), c(2,4,5), c(2,4), c(1,4,5))
splitstackshape:::numMat(var, mode = "value")
#       1  2  3 4  5
# [1,] NA NA  3 4  5
# [2,] NA  2 NA 4  5
# [3,] NA  2 NA 4 NA
# [4,]  1 NA NA 4  5

根据OPs的答案判断,“var”是一个字符串,如:

var <- c("3, 4, 5", "2, 4, 5", "2, 4", "1, 4, 5")
如果它是一个
列表
,正如其他答案所假设的那样,您可以使用“splitstackshape”中的(非导出的)
numMat
功能,为
cSplit\u e
供电

var <- list(c(3,4,5), c(2,4,5), c(2,4), c(1,4,5))
splitstackshape:::numMat(var, mode = "value")
#       1  2  3 4  5
# [1,] NA NA  3 4  5
# [2,] NA  2 NA 4  5
# [3,] NA  2 NA 4 NA
# [4,]  1 NA NA 4  5

Var
列表
还是
向量
还是什么?你的例子是不可复制的。它是
c(3,4,5,2,4,5,2,4,5,2,4,1,4,5)
还是
列表(c(3,4,5),c(2,4,5),c(2,4),c(1,4,5))
还是
c(“3,4,52,52,41,4,5”)
?是
Var
列表还是
向量
还是什么?你的例子是不可复制的。是
c(3,4,5,2,4,5,2,4,5,2,4,1,4,5)
还是
列表(c(3,4,5),c(2,4,5),c(2,4),c(1,4,5))
还是
c(“3,4,52,52,41,4,5”)
?谢谢!您的第一个解决方案非常有效,使我的代码更短!谢谢!您的第一个解决方案非常有效,使我的代码更短!