R将数据帧的每一行拆分为两行_R

R将数据帧的每一行拆分为两行

R将数据帧的每一行拆分为两行,r,R,我想把数据帧（numberic）的每一行分成两行。例如，原始数据帧的一部分如下（nrow（original datafram）>2800000）：拆分每行后，我们可以得到： ID X Y Z 1 3 2 6 22 54 NA NA 6 11 5 9 52 71 NA NA 3 7 2 5 2 34 NA NA 5 10 7 1 23 47 NA NA “value_1”和“value_2”列被拆分，每个元素被设置为新行。

我想把数据帧（numberic）的每一行分成两行。例如，原始数据帧的一部分如下（nrow（original datafram）>2800000）：

拆分每行后，我们可以得到：

ID  X  Y  Z  
1   3  2  6  
22 54 NA NA  
6  11  5  9  
52 71 NA NA  
3   7  2  5  
2  34 NA NA  
5  10  7  1  
23 47 NA NA

“value_1”和“value_2”列被拆分，每个元素被设置为新行。例如，值_1=22和值_2=54被设置为新行。

这里有一个（非常慢的）纯R解决方案，不使用额外的包：

# Replicate your matrix
input_df <- data.frame(ID = rnorm(10000),
                           X = rnorm(10000),
                           Y = rnorm(10000),
                           Z = rnorm(10000),
                           value_1 = rnorm(10000),
                           value_2 = rnorm(10000))

# Preallocate memory to a data frame
output_df <- data.frame(
    matrix(
      nrow = nrow(input_df)*2,
      ncol = ncol(input_df)-2))

# Loop through each row in turn.
# Put the first four elements into the current 
# row, and the next two into the current+1 row
# with two NAs attached.
for(i in seq(1, nrow(output_df), 2)){
  output_df[i,] <- input_df[i, c(1:4)]
  output_df[i+1,] <- c(input_df[i, c(5:6)],NA,NA)
}

colnames(output_df) <- c("ID", "X", "Y", "Z")

这里有一个带有

数据表的选项。我们通过创建一列行名（setDT（df1，keep.rownames=TRUE）
）将“data.frame”转换为“data.table”。将list
中的1:5
和1,6,7
列子集，rbind
使用fill=TRUE
选项对list
元素执行list
操作，以返回其中一个数据集中未找到的对应列的NA，按行号（'rn'）排序
，并赋值（：=/code>）行号列设置为“NULL”
library(data.table)
setDT(df1, keep.rownames = TRUE)[]
rbindlist(list(df1[, 1:5, with = FALSE], setnames(df1[, c(1, 6:7),
   with = FALSE], 2:3, c("ID", "X"))), fill = TRUE)[order(rn)][, rn:= NULL][]
#    ID  X  Y  Z
#1:  1  3  2  6
#2: 22 54 NA NA
#3:  6 11  5  9
#4: 52 71 NA NA
#5:  3  7  2  5
#6:  2 34 NA NA
#7:  5 10  7  1
#8: 23 47 NA NA


与上述逻辑相对应的hadleyverse

library(dplyr)
tibble::rownames_to_column(df1[1:4]) %>% 
         bind_rows(., setNames(tibble::rownames_to_column(df1[5:6]), 
                         c("rowname", "ID", "X"))) %>% 
         arrange(rowname) %>% 
         select(-rowname)
#   ID  X  Y  Z
#1  1  3  2  6
#2 22 54 NA NA
#3  6 11  5  9
#4 52 71 NA NA
#5  3  7  2  5
#6  2 34 NA NA
#7  5 10  7  1
#8 23 47 NA NA

数据
df1这应该行得通
data <- read.table(text= "ID X Y Z value_1 value_2
           1  3 2 6     22       54
           6 11 5 9     52       71
           3  7 2 5      2       34
           5 10 7 1     23       47", header=T)

data1 <- data[,1:4]
data2 <- setdiff(data,data1)
names(data2) <- names(data1)[1:ncol(data2)]

combined <- plyr::rbind.fill(data1,data2)
n <- nrow(data1)
combined[kronecker(1:n, c(0, n), "+"),]

数据感谢您的回复。那么哪个解决方案更快或更高效呢？Hi@akrun，我使用了第二个解决方案（使用dplyr），发现ID列的顺序发生了变化。如何保持ID列的原始顺序？谢谢大家!@根据这个例子，它给出了相同的结果，是的，exmaple没有问题，但是当原始数据帧的ID从1到11，然后应用第二个解决方案时，默认顺序将更改为1，（新行），10，（新行），11，（新行），好的，原因是rn
或rowname
是字符类。在执行arrange
步骤之前，执行mutate（rowname=as.numeric（rowname））%%>%arrange（rowname）%%>%
library(dplyr)
tibble::rownames_to_column(df1[1:4]) %>% 
         bind_rows(., setNames(tibble::rownames_to_column(df1[5:6]), 
                         c("rowname", "ID", "X"))) %>% 
         arrange(rowname) %>% 
         select(-rowname)
#   ID  X  Y  Z
#1  1  3  2  6
#2 22 54 NA NA
#3  6 11  5  9
#4 52 71 NA NA
#5  3  7  2  5
#6  2 34 NA NA
#7  5 10  7  1
#8 23 47 NA NA

df1 <- structure(list(ID = c(1L, 6L, 3L, 5L), X = c(3L, 11L, 7L, 10L
), Y = c(2L, 5L, 2L, 7L), Z = c(6L, 9L, 5L, 1L), value_1 = c(22L, 
52L, 2L, 23L), value_2 = c(54L, 71L, 34L, 47L)), .Names = c("ID", 
"X", "Y", "Z", "value_1", "value_2"), class = "data.frame",
 row.names = c(NA, -4L))

data <- read.table(text= "ID X Y Z value_1 value_2
           1  3 2 6     22       54
           6 11 5 9     52       71
           3  7 2 5      2       34
           5 10 7 1     23       47", header=T)

data1 <- data[,1:4]
data2 <- setdiff(data,data1)
names(data2) <- names(data1)[1:ncol(data2)]

combined <- plyr::rbind.fill(data1,data2)
n <- nrow(data1)
combined[kronecker(1:n, c(0, n), "+"),]