Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/grails/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在一个绘图上显示多个时间序列中缺少的值_R_Plot_Ggplot2_Time Series_Missing Data - Fatal编程技术网

R 在一个绘图上显示多个时间序列中缺少的值

R 在一个绘图上显示多个时间序列中缺少的值,r,plot,ggplot2,time-series,missing-data,R,Plot,Ggplot2,Time Series,Missing Data,我连续三年有五个不同的时间序列。现在我想通过绘图上的间隙来显示这些系列中缺少的值。所以,我想我将创建另一个与这些系列相对应的数据帧,如果我有一个值,我将用一个替换它,并将NA保留为这样。这种虚拟数据帧如下所示: # create sample time index timeindex <- seq(as.POSIXct("2014-01-01"),as.POSIXct("2016-12-31"),by="1 mins") # create 5 sample series of same l

我连续三年有五个不同的时间序列。现在我想通过绘图上的间隙来显示这些系列中缺少的值。所以,我想我将创建另一个与这些系列相对应的数据帧,如果我有一个值,我将用一个替换它,并将NA保留为这样。这种虚拟数据帧如下所示:

# create sample time index
timeindex <- seq(as.POSIXct("2014-01-01"),as.POSIXct("2016-12-31"),by="1 mins")
# create 5 sample series of same length as of time index
sequence_1 <- sample(seq(from = 0, to = 1, by = 1), size =  length(timeindex), replace = TRUE)
sequence_2 <- sample(seq(from = 0, to = 1, by = 1), size =  length(timeindex), replace = TRUE)
sequence_3 <- sample(seq(from = 0, to = 1, by = 1), size =  length(timeindex), replace = TRUE)
sequence_4 <- sample(seq(from = 0, to = 1, by = 1), size =  length(timeindex), replace = TRUE)
sequence_5 <- sample(seq(from = 0, to = 1, by = 1), size =  length(timeindex), replace = TRUE)
# create data frame of sequences
df <- data.frame(sequence_1,sequence_2,sequence_3,sequence_4,sequence_5)
df <- ifelse(df==0,NA,1) # replace 0 with NA to show missing data values
df_with_time <- data.frame(timeindex,df) # attach timestamp to sequences
现在我有两个问题:

  • 尽管ggplot无法在配备8GB RAM、2.6GHz处理器的机器上绘制如此巨大的数据。有没有其他方法来绘制如此庞大的数据
  • 是否有其他方法显示数据中的间隙(缺少值)
  • 更新 我想要一个这样的情节:


    缺少的数据点显示为间隙。

    如果无法跨系列聚合NAs,我建议对数据执行基于时间的装箱。简单地说,您可以计算30分钟或60分钟窗口中的NA数量,并使用ggplot绘制计数。我在下面展示一个例子

    # Binning
    head(df_with_time)
    time.gap <- 60 # bin by hour
    idx <- seq(1, nrow(df_with_time), by = time.gap) 
    na.counts <- lapply(idx[-length(idx)], (function(i){
      tmp <- df_with_time[i:(i+(time.gap-1)),]
      counts <- apply(tmp[,-1], 2, (function(y){ sum(is.na(y)) }))
      counts
    }))
    na.counts <- data.frame(time=df_with_time[idx[-length(idx)],]$timeindex, 
                            do.call(rbind, na.counts), 
                            stringsAsFactors = FALSE,
                            row.names = NULL)
    head(na.counts)
    
    # Convert to suitable df and then plot (color tracks with NA count)
    df_melt <- reshape2::melt(na.counts,id.vars="time") # melt for ggplot
    df_melt$y <- as.integer(as.factor(df_melt$variable))
    df_melt <- df_melt[order(df_melt$value - median(df_melt$value)), ]
    
    ggplot(df_melt,aes(x=time, y=y)) +  
      geom_point(aes(colour = value), shape = 124, alpha = 0.75, size = 2.5) + 
      scale_colour_gradient2(low = "#01665e", mid = "#f5f5f5", high = "#8c510a", midpoint = median(df_melt$value))
    
    #Binning
    头部(df_和_时间)
    
    time.gap如果您将其重塑为长格式:

    data <- reshape2::melt(df_with_time,
                           id.vars="timeindex", 
                           variable.name = 'Sequence', 
                           value.name = 'Data')
    
    这是一个月,让事情变得更小:

    能否请您发布示例数据(
    df_melt
    )以及缺少的值。这将在一条水平线上绘制所有内容,并突出显示与NA对应的点。但是,您是否能够使用此策略有效地“查看”NA?如果你关心你一次拥有多少NAs,也许你可以在NAs中绘制一个计数的条,也就是说,如果你不能处理所有的数据,也许你应该考虑装箱时间。我想把缺失的点作为空白,而不是NA的总和。请看问题中更新的演示图。明白了。那垃圾箱呢?你能用bin缩小数据大小吗?是的,我想我需要用binning。我想你仍然对结果不满意…:-P
    df_melt2 <- df_melt[abs(df_melt$value - median(df_melt$value)) > 8, ]
    
    ggplot(df_melt2,aes(x=time, y=y)) +  
      geom_point(aes(colour = value), shape = 124, alpha = 0.75, size = 4.5) + 
      scale_colour_gradient2(low = "#01665e", mid = "#f5f5f5", high = "#8c510a", midpoint = median(df_melt$value))
    
    data <- reshape2::melt(df_with_time,
                           id.vars="timeindex", 
                           variable.name = 'Sequence', 
                           value.name = 'Data')
    
    ggplot(data, 
           aes(x = timeindex, 
               y = Sequence, 
               size = Data)) + 
    geom_line()