根据定义的连续观察次数,将面板数据中的变量置于R条件中

根据定义的连续观察次数,将面板数据中的变量置于R条件中,r,data-cleaning,panel-data,R,Data Cleaning,Panel Data,我对R很陌生,我的问题如下: 我有一组按如下时间序列组织的面板数据(仅显示部分): Week\u开始A队B队C队D队 2010-01-02 1 2 3 4 2010-01-09 2 40 1 5 2010-01-16 15 4 11 2010-01-2

我对R很陌生,我的问题如下:

我有一组按如下时间序列组织的面板数据(仅显示部分):

Week\u开始A队B队C队D队
2010-01-02         1                   2           3        4
2010-01-09         2                  40           1        5
2010-01-16        15                           4       11
2010-01-23        25                           7       18
2010-01-30        38                           9       29
2010-02-06                                12       34
2010-02-13                                16       40
2010-02-20                                20     
2010-02-27                                15       28
2010-03-06                                20     
2010-03-13                                24     
2010-03-20                                24     
2010-03-27                                21     
2010-04-03                                27     
2010-04-10                                24     
2010-04-17                                25     
2010-04-24                                35     
2010-05-01                                40     
2010-05-08                                32     
2010-05-15                                   
2010-05-22                                39     
例如,使用团队B是没有意义的,因为有太多遗漏的观察结果。排名系统不提供排名低于40的数据。因此,我想通过删除至少没有8周连续观察的列(变量)来清理(例如,本例中的团队A、B和D)。因此,D不符合要求,因为从2010年2月20日开始的一周中存在差距。请记住,我有超过1000个列

我以前试过,但它没有给我我想要的,不幸的是,我没有足够的技能来修改代码以满足我的需要

我可以想出一些可能的解决方案:

  • 子集每个变量中具有8个或更多连续观测值的部分

  • 设置观察值=NA如果连续运行8个obs包含NA,则删除只有NA的列,因为不满足8个最短周要求的列将只有NA值(我希望您理解我的意思)

  • 只是出于兴趣,如果数据是以长格式组织的,那么做同样的事情会更困难吗

    #Using MrFlick's data frame
    
    melt(dd,id="Week_Starting")
    
           Week_Starting variable value
        1     2010-01-02   Team_A     1
        2     2010-01-09   Team_A     2
        3     2010-01-16   Team_A    15
        4     2010-01-23   Team_A    25
        5     2010-01-30   Team_A    38
        6     2010-02-06   Team_A    NA
        7     2010-02-13   Team_A    NA
        8     2010-02-20   Team_A    NA
        9     2010-02-27   Team_A    NA
        10    2010-03-06   Team_A    NA
        11    2010-03-13   Team_A    NA
        12    2010-03-20   Team_A    NA
        13    2010-03-27   Team_A    NA
        14    2010-04-03   Team_A    NA
        15    2010-04-10   Team_A    NA
        16    2010-04-17   Team_A    NA
        17    2010-04-24   Team_A    NA
        18    2010-05-01   Team_A    NA
        19    2010-05-08   Team_A    NA
        20    2010-05-15   Team_A    NA
        21    2010-05-22   Team_A    NA
        22    2010-01-02   Team_B     2
        23    2010-01-09   Team_B    40
        24    2010-01-16   Team_B    NA
        25    2010-01-23   Team_B    NA
        26    2010-01-30   Team_B    NA
        27    2010-02-06   Team_B    NA
        28    2010-02-13   Team_B    NA
        29    2010-02-20   Team_B    NA
        30    2010-02-27   Team_B    NA
        31    2010-03-06   Team_B    NA
        32    2010-03-13   Team_B    NA
        33    2010-03-20   Team_B    NA
        34    2010-03-27   Team_B    NA
        35    2010-04-03   Team_B    NA
        36    2010-04-10   Team_B    NA
        37    2010-04-17   Team_B    NA
        38    2010-04-24   Team_B    NA
        39    2010-05-01   Team_B    NA
        40    2010-05-08   Team_B    NA
        41    2010-05-15   Team_B    NA
        42    2010-05-22   Team_B    NA
        43    2010-01-02   Team_C     3
        44    2010-01-09   Team_C     1
        45    2010-01-16   Team_C     4
        46    2010-01-23   Team_C     7
        47    2010-01-30   Team_C     9
        48    2010-02-06   Team_C    12
        49    2010-02-13   Team_C    16
        50    2010-02-20   Team_C    20
        51    2010-02-27   Team_C    15
        52    2010-03-06   Team_C    20
        53    2010-03-13   Team_C    24
        54    2010-03-20   Team_C    24
        55    2010-03-27   Team_C    21
        56    2010-04-03   Team_C    27
        57    2010-04-10   Team_C    24
        58    2010-04-17   Team_C    25
        59    2010-04-24   Team_C    35
        60    2010-05-01   Team_C    40
        61    2010-05-08   Team_C    32
        62    2010-05-15   Team_C    NA
        63    2010-05-22   Team_C    39
        64    2010-01-02   Team_D     4
        65    2010-01-09   Team_D     5
        66    2010-01-16   Team_D    11
        67    2010-01-23   Team_D    18
        68    2010-01-30   Team_D    29
        69    2010-02-06   Team_D    34
        70    2010-02-13   Team_D    40
        71    2010-02-20   Team_D    NA
        72    2010-02-27   Team_D    28
        73    2010-03-06   Team_D    NA
        74    2010-03-13   Team_D    NA
        75    2010-03-20   Team_D    NA
        76    2010-03-27   Team_D    NA
        77    2010-04-03   Team_D    NA
        78    2010-04-10   Team_D    NA
        79    2010-04-17   Team_D    NA
        80    2010-04-24   Team_D    NA
        81    2010-05-01   Team_D    NA
        82    2010-05-08   Team_D    NA
        83    2010-05-15   Team_D    NA
        84    2010-05-22   Team_D    NA
    

    有什么建议吗?

    您可以使用
    rle
    来计算非NA值的运行长度。首先,这里是一个很好的data.frame,您可以使用数据进行复制/粘贴

    dd<-structure(list(Week_Starting = structure(1:21, .Label = c("2010-01-02", 
    "2010-01-09", "2010-01-16", "2010-01-23", "2010-01-30", "2010-02-06", 
    "2010-02-13", "2010-02-20", "2010-02-27", "2010-03-06", "2010-03-13", 
    "2010-03-20", "2010-03-27", "2010-04-03", "2010-04-10", "2010-04-17", 
    "2010-04-24", "2010-05-01", "2010-05-08", "2010-05-15", "2010-05-22"
    ), class = "factor"), Team_A = c(1L, 2L, 15L, 25L, 38L, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Team_B = c(2L, 
    40L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), Team_C = c(3L, 1L, 4L, 7L, 9L, 12L, 16L, 
    20L, 15L, 20L, 24L, 24L, 21L, 27L, 24L, 25L, 35L, 40L, 32L, NA, 
    39L), Team_D = c(4L, 5L, 11L, 18L, 29L, 34L, 40L, NA, 28L, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Week_Starting", 
    "Team_A", "Team_B", "Team_C", "Team_D"), class = "data.frame", row.names = c(NA, 
    -21L))
    

    非常感谢你!这是一个非常有用的答案!
    consecnonNA <- function(x) {
        rr<-rle(is.na(x))
        max(rr$lengths[rr$values==FALSE])
    }
    
    atleast <- function(i) {function(x) x>=i}
    hasatleast8 <- names(Filter(atleast(8), sapply(dd[,-1], consecnonNA)))
    
    dd[, c("Week_Starting", hasatleast8), drop=F]