R 将数据帧拆分为四个部分并取消拆分_R_Split

R 将数据帧拆分为四个部分并取消拆分

R 将数据帧拆分为四个部分并取消拆分,r,split,R,Split,我想把一个数据帧分成4个相等的部分，因为我想使用计算机的4个核心我这样做： df2 <- split(df, 1:4) unsplit(df2, f=1:4) 你知道原因吗？df中有多少行？如果表中的行数不能被4整除，则会收到该警告。我认为您使用的拆分因子f不正确，除非您要做的是将后续的每一行放入不同的split data.frame中如果您真的想将数据拆分为4个数据帧。一行接一行，然后使用rep_len将分割因子设置为与数据帧中的行数相同的大小，如下所示： ## Split lik

我想把一个数据帧分成4个相等的部分，因为我想使用计算机的4个核心

我这样做：

df2 <- split(df, 1:4)
unsplit(df2, f=1:4)

你知道原因吗？

df中有多少行？如果表中的行数不能被4整除，则会收到该警告。我认为您使用的拆分因子

不正确，除非您要做的是将后续的每一行放入不同的split data.frame中

如果您真的想将数据拆分为4个数据帧。一行接一行，然后使用

rep_len

将分割因子设置为与数据帧中的行数相同的大小，如下所示：

## Split like this:
split(df , f = rep_len(1:4, nrow(df) ) )
## Unsplit like this:
unsplit( split(df , f = rep_len(1:4, nrow(df) ) ) , f = rep_len(1:4,nrow(df) ) )

希望本例能够说明错误发生的原因以及如何避免（即使用适当的分割因子！）

##希望将data.frame拆分为两半，但行不能被2整除
df在R语言“split”示例中
aq <- airquality
g <- aq$Month
l <- split(aq,g)

正如在这里看到的
> str(l)
List of 5
 $ 5:'data.frame':      31 obs. of  6 variables:
  ..$ Ozone  : num [1:31, 1] 0.782 0.557 -0.523 -0.253 NA ...
  .. ..- attr(*, "scaled:center")= num 23.6
  .. ..- attr(*, "scaled:scale")= num 22.2
  ..$ Solar.R: int [1:31] 190 118 149 313 NA NA 299 99 19 194 ...
  ..$ Wind   : num [1:31] 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
  ..$ Temp   : int [1:31] 67 72 74 62 56 66 65 59 61 69 ...
  ..$ Month  : int [1:31] 5 5 5 5 5 5 5 5 5 5 ...
  ..$ Day    : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
 $ 6:'data.frame':      30 obs. of  6 variables:
  ..$ Ozone  : num [1:30, 1] NA NA NA NA NA ...
  .. ..- attr(*, "scaled:center")= num 29.4
  .. ..- attr(*, "scaled:scale")= num 18.2
  ..$ Solar.R: int [1:30] 286 287 242 186 220 264 127 273 291 323 ...
  ..$ Wind   : num [1:30] 8.6 9.7 16.1 9.2 8.6 14.3 9.7 6.9 13.8 11.5 ...
  ..$ Temp   : int [1:30] 78 74 67 84 85 79 82 87 90 87 ...
  ..$ Month  : int [1:30] 6 6 6 6 6 6 6 6 6 6 ...
  ..$ Day    : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...
 $ 7:'data.frame':      31 obs. of  6 variables:
  ..$ Ozone  : num [1:31, 1] 2.399 -0.32 -0.857 NA 0.154 ...
  .. ..- attr(*, "scaled:center")= num 59.1
  .. ..- attr(*, "scaled:scale")= num 31.6
  ..$ Solar.R: int [1:31] 269 248 236 101 175 314 276 267 272 175 ...
  ..$ Wind   : num [1:31] 4.1 9.2 9.2 10.9 4.6 10.9 5.1 6.3 5.7 7.4 ...
  ..$ Temp   : int [1:31] 84 85 81 84 83 83 88 92 92 89 ...
  ..$ Month  : int [1:31] 7 7 7 7 7 7 7 7 7 7 ...
  ..$ Day    : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
 $ 8:'data.frame':      31 obs. of  6 variables:
  ..$ Ozone  : num [1:31, 1] -0.528 -1.284 -1.108 0.455 -0.629 ...
  .. ..- attr(*, "scaled:center")= num 60
  .. ..- attr(*, "scaled:scale")= num 39.7
  ..$ Solar.R: int [1:31] 83 24 77 NA NA NA 255 229 207 222 ...
  ..$ Wind   : num [1:31] 6.9 13.8 7.4 6.9 7.4 4.6 4 10.3 8 8.6 ...
  ..$ Temp   : int [1:31] 81 81 82 86 85 87 89 90 90 92 ...
  ..$ Month  : int [1:31] 8 8 8 8 8 8 8 8 8 8 ...
  ..$ Day    : int [1:31] 1 2 3 4 5 6 7 8 9 10 ...
 $ 9:'data.frame':      30 obs. of  6 variables:
  ..$ Ozone  : num [1:30, 1] 2.674 1.928 1.721 2.467 0.644 ...
  .. ..- attr(*, "scaled:center")= num 31.4
  .. ..- attr(*, "scaled:scale")= num 24.1
  ..$ Solar.R: int [1:30] 167 197 183 189 95 92 252 220 230 259 ...
  ..$ Wind   : num [1:30] 6.9 5.1 2.8 4.6 7.4 15.5 10.9 10.3 10.9 9.7 ...
  ..$ Temp   : int [1:30] 91 92 93 93 87 84 80 78 75 73 ...
  ..$ Month  : int [1:30] 9 9 9 9 9 9 9 9 9 9 ...
  ..$ Day    : int [1:30] 1 2 3 4 5 6 7 8 9 10 ...

但现在它确实添加了这些属性
  ..$ Ozone  : num ...
  .. ..- attr(*, "scaled:center")= num 29.4
  .. ..- attr(*, "scaled:scale")= num 18.2

而且非常简单的“unsplit”函数没有编程来处理这些属性
> unsplit(l,g)
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

attributes(l[[1]]$Ozone) <- NULL
attributes(l[[2]]$Ozone) <- NULL
attributes(l[[3]]$Ozone) <- NULL
attributes(l[[4]]$Ozone) <- NULL
attributes(l[[5]]$Ozone) <- NULL

（直接而简单的）解决方案是去掉这些属性
> unsplit(l,g)
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

attributes(l[[1]]$Ozone) <- NULL
attributes(l[[2]]$Ozone) <- NULL
attributes(l[[3]]$Ozone) <- NULL
attributes(l[[4]]$Ozone) <- NULL
attributes(l[[5]]$Ozone) <- NULL

所以，现在它起作用了
安德烈·米库莱克
对于这一点，我认为您可以很好地使用plyr
。它支持多核处理，例如使用ddply
。您无需为了并行化操作而拆分数据帧。只需使用类似于lappy（seq（nrow（df）），function（i）{…}
的东西以及R的内置并行包。或者您是否迫切需要手动拆分数据？我认为您确实不想逐行（一次4行）处理数据。除非每行花费很长时间（>几秒钟），否则并行化的开销将导致分析变得更慢。是的，我这样做只是为了并行化我的操作。对不起，保罗·希姆斯特拉，我不明白你最后的评论。即使使用ddply也不好？Stephane@PaulHiemstra不会看到评论，除非您使用@符号提及他们，以确保他们收到您在问题上发布的通知。（对不起，保罗！）
  ..$ Ozone  : num ...
  .. ..- attr(*, "scaled:center")= num 29.4
  .. ..- attr(*, "scaled:scale")= num 18.2

> unsplit(l,g)
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

attributes(l[[1]]$Ozone) <- NULL
attributes(l[[2]]$Ozone) <- NULL
attributes(l[[3]]$Ozone) <- NULL
attributes(l[[4]]$Ozone) <- NULL
attributes(l[[5]]$Ozone) <- NULL

str( unsplit(l,g) )

> str( unsplit(l,g) )
'data.frame':   153 obs. of  6 variables:
 $ Ozone  : num  0.782 0.557 -0.523 -0.253 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...