R 当数据与组标题散布在同一列中时，如何对数据进行分组？_R_Data.table

R 当数据与组标题散布在同一列中时，如何对数据进行分组？

R 当数据与组标题散布在同一列中时，如何对数据进行分组？,r,data.table,R,Data.table,基本上，我的数据按天分组，其间的行数不一致： 16-Oct-16 Name1 Name2 Name3 17-Oct-16 Name1 Name2 Name3 Name4 Name5 19-Oct-16 等等我需要能够获取组数据并将其应用于子记录。预期结果应如下所示： Name1 16-Oct-16 Name2 16-Oct-16 Name3 16-Oct-16 Name1 17-Oct-16 Name2 17-Oct-16 Name3 17-Oct-16 Name4

基本上，我的数据按天分组，其间的行数不一致：

16-Oct-16
Name1
Name2
Name3
17-Oct-16
Name1
Name2
Name3
Name4
Name5
19-Oct-16

等等

我需要能够获取组数据并将其应用于子记录。预期结果应如下所示：

Name1   16-Oct-16
Name2   16-Oct-16
Name3   16-Oct-16
Name1   17-Oct-16
Name2   17-Oct-16
Name3   17-Oct-16
Name4   17-Oct-16
Name5   17-Oct-16

我使用的是

data.table

，但目前除了循环之外，我想不出任何其他方法

以下脚本生成我正在查看的数据集类型：

data.table(c('October 16, 2016', paste0('Name',1:4),
             'October 17, 2016', paste0('Name',1:12),
             'October 20, 2016', paste0('Name',1:2),
             'October 25, 2016', paste0('Name',1:6)))

我只想将适当的日期字段复制到每一个名称行，最后得到一个整洁的数据集，其中每一行都有名称和日期。

有一个

数据。我在类似情况下使用了表解决方案。（我已经用data.table
1.9.7版进行了测试，但它也应该适用于CRAN 1.9.6版）
读取数据
（我想有一个更快的版本使用滚动联接而不是na.locf
）
删除组标题行
要删除组标题，我们需要保留一个临时列
dt[, tmp := lubridate::dmy(V1)][, day := zoo::na.locf(tmp)]
print(dt)
           V1        tmp        day
 1: 16-Oct-16 2016-10-16 2016-10-16
 2:     Name1       <NA> 2016-10-16
 3:     Name2       <NA> 2016-10-16
 4:     Name3       <NA> 2016-10-16
 5: 17-Oct-16 2016-10-17 2016-10-17
 6:     Name1       <NA> 2016-10-17
 7:     Name2       <NA> 2016-10-17
 8:     Name3       <NA> 2016-10-17
 9:     Name4       <NA> 2016-10-17
10:     Name5       <NA> 2016-10-17
11: 19-Oct-16 2016-10-19 2016-10-19

dt <- dt[is.na(tmp)]
print(dt)
      V1  tmp        day
1: Name1 <NA> 2016-10-16
2: Name2 <NA> 2016-10-16
3: Name3 <NA> 2016-10-16
4: Name1 <NA> 2016-10-17
5: Name2 <NA> 2016-10-17
6: Name3 <NA> 2016-10-17
7: Name4 <NA> 2016-10-17
8: Name5 <NA> 2016-10-17

dt[, tmp := NULL]
print(dt)
      V1        day
1: Name1 2016-10-16
2: Name2 2016-10-16
3: Name3 2016-10-16
4: Name1 2016-10-17
5: Name2 2016-10-17
6: Name3 2016-10-17
7: Name4 2016-10-17
8: Name5 2016-10-17

dt[，tmp:=lubridate:：dmy（V1）][，day:=zoo:：na.locf（tmp）]
打印（dt）
V1 tmp日
1:16-10-16 2016-10-16 2016-10-16
2：名称1 2016-10-16
3:姓名2 2016-10-16
4：姓名3 2016-10-16
5:2016年10月17日2016年10月17日2016年10月17日
6：名称1 2016-10-17
7:Name2 2016-10-17
8：姓名3 2016-10-17
9：姓名4 2016-10-17
10:Name5 2016-10-17
11:19-10-16 2016-10-19 2016-10-19
dt另一种选择是使用正则表达式模式。对于第一个示例数据集：
library(data.table)
library(zoo)
dt1[grep('([0-9]{1,2})-([A-Za-z]+)-(\\d{2})', V1), V2 := V1
    ][, V2 := na.locf(V2)][V1!=V2]

其中：
      V1        V2
1: Name1 16-Oct-16
2: Name2 16-Oct-16
3: Name3 16-Oct-16
4: Name1 17-Oct-16
5: Name2 17-Oct-16
6: Name3 17-Oct-16
7: Name4 17-Oct-16
8: Name5 17-Oct-16

        V1               V2
 1:  Name1 October 16, 2016
 2:  Name2 October 16, 2016
 3:  Name3 October 16, 2016
 4:  Name4 October 16, 2016
 5:  Name1 October 17, 2016
 6:  Name2 October 17, 2016
 7:  Name3 October 17, 2016
 8:  Name4 October 17, 2016
 9:  Name5 October 17, 2016
10:  Name6 October 17, 2016
11:  Name7 October 17, 2016
12:  Name8 October 17, 2016
13:  Name9 October 17, 2016
14: Name10 October 17, 2016
15: Name11 October 17, 2016
16: Name12 October 17, 2016
17:  Name1 October 20, 2016
18:  Name2 October 20, 2016
19:  Name1 October 25, 2016
20:  Name2 October 25, 2016
21:  Name3 October 25, 2016
22:  Name4 October 25, 2016
23:  Name5 October 25, 2016
24:  Name6 October 25, 2016

对于第二个数据集，可以使用：
dt2[grep('([A-Za-z]+ )([0-9]{1,2}[,] )(\\d{4})', V1), V2 := V1
    ][, V2 := na.locf(V2)][V1!=V2]

其中：
      V1        V2
1: Name1 16-Oct-16
2: Name2 16-Oct-16
3: Name3 16-Oct-16
4: Name1 17-Oct-16
5: Name2 17-Oct-16
6: Name3 17-Oct-16
7: Name4 17-Oct-16
8: Name5 17-Oct-16

        V1               V2
 1:  Name1 October 16, 2016
 2:  Name2 October 16, 2016
 3:  Name3 October 16, 2016
 4:  Name4 October 16, 2016
 5:  Name1 October 17, 2016
 6:  Name2 October 17, 2016
 7:  Name3 October 17, 2016
 8:  Name4 October 17, 2016
 9:  Name5 October 17, 2016
10:  Name6 October 17, 2016
11:  Name7 October 17, 2016
12:  Name8 October 17, 2016
13:  Name9 October 17, 2016
14: Name10 October 17, 2016
15: Name11 October 17, 2016
16: Name12 October 17, 2016
17:  Name1 October 20, 2016
18:  Name2 October 20, 2016
19:  Name1 October 25, 2016
20:  Name2 October 25, 2016
21:  Name3 October 25, 2016
22:  Name4 October 25, 2016
23:  Name5 October 25, 2016
24:  Name6 October 25, 2016


使用数据：
dt1 <- fread("16-Oct-16
             Name1
             Name2
             Name3
             17-Oct-16
             Name1
             Name2
             Name3
             Name4
             Name5
             19-Oct-16", header = FALSE)

dt1在现实世界中，首先如何获取这些数据？这听起来像是一个完美的工具，可以在将数据集加载到RCU之前对其进行整理。请您在示例数据中更加精确，您的预期结果是什么？有关如何改进您的问题的指南，请参阅，举个例子会有所帮助。您显示的示例数据和生成它们的脚本应该是一致的。现在，他们使用两种不同的日期格式。
dt2 <- data.table(c('October 16, 2016', paste0('Name',1:4),
                    'October 17, 2016', paste0('Name',1:12),
                    'October 20, 2016', paste0('Name',1:2),
                    'October 25, 2016', paste0('Name',1:6)))