R 两个NAs之间的元素求和

R 两个NAs之间的元素求和,r,R,我有一个数据集,其形式如下: > dput(greece_news_data_combined[27192:27220,]) structure(list(time_and_date_correct = structure(c(1295435821.228, 1295436780, 1295436780, 1295441160, 1295449020, 1295449020, 1295449020, 1295449020, 1295449020, 1295449020, 12954

我有一个数据集,其形式如下:

  > dput(greece_news_data_combined[27192:27220,])
structure(list(time_and_date_correct = structure(c(1295435821.228, 
1295436780, 1295436780, 1295441160, 1295449020, 1295449020, 1295449020, 
1295449020, 1295449020, 1295449020, 1295449020, 1295449020, 1295462160, 
1295462160, 1295464200, 1295464200, 1295497810.833, 1295498110.378, 
1295498410.519, 1295498710.444, 1295499010.456, 1295499310.399, 
1295499610.479, 1295499910.325, 1295500210.583, 1295500510.338, 
1295500810.38, 1295501110.317, 1295501410.539), class = c("POSIXct", 
"POSIXt"), tzone = ""), log_returns = c(0, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -0.00601513577729679, 
-0.000206914274819529, 2.67010219832664e-05, 0.0024201544576403, 
0.0050083466252285, -0.00333167721488612, 0.00130213542003227, 
0.00560767076743004, 0.000679785002929741, 0.000336421598800745, 
-7.91478416137673e-05, 0.00181223339755887, 0.00268922532925481
), negative_percentage = c(NA, 2.20883534136546, 2.20883534136546, 
5.55555555555556, 3.59897172236504, 3.59897172236504, 3.59897172236504, 
3.59897172236504, 3.59897172236504, 3.59897172236504, 3.59897172236504, 
3.59897172236504, 4.45269016697588, 4.45269016697588, 1.39442231075697, 
2.1978021978022, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA), positive_percentage = c(NA, 2.81124497991968, 2.81124497991968, 
3.17460317460317, 0.25706940874036, 0.25706940874036, 0.25706940874036, 
0.25706940874036, 0.25706940874036, 0.25706940874036, 0.25706940874036, 
0.25706940874036, 0.556586270871985, 0.556586270871985, 0.99601593625498, 
1.0989010989011, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA), sum_of_negative = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), number_of_articles = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
)), row.names = 27192:27220, class = "data.frame")

我想保留NAs之间行的总和,以及它们的数量。我的数据集有多个部分出现在NAs之间,我想以某种方式折叠它们,然后得到总和和行数,这样我就可以计算变量Var1等的平均值。知道如何做到这一点吗?

下面应该给出您想要的。函数naSplits标识NA值出现在非NA值之前/之后的位置,返回NA/非NA打断的位置索引,通过该索引分割集合。对于边距上的NA值或缺少NA值,可附加起始索引1或向量长度,以给出起始-结束位置对作为向量。然后将其拆分为迭代的开始/结束位置对列表。lappy用于通过从示例中定义的df创建一个名为df_splits的data.frame来遍历列。结果是非NA值的中间序列列表、长度计数和值总和

naSplits <- function(vec)
{
  naBelow <- is.na(vec[c(2:length(vec),length(vec))])
  naAbove <- is.na(vec[c(1,1:length(vec)-1)])
  naRow <- is.na(vec)

  splits <- which((!naRow & naAbove) | (!naRow & naBelow))

  if (!is.na(vec[1])) splits <- c(1,splits)
  if (!is.na(vec[length(vec)])) splits <- c(splits,length(vec))

  split_groups <- cumsum(seq_along(splits) %% 2)
  split(data.frame(split_groups,splits,type=c("start","end")),split_groups)
}


df_splits <- df[,c("log_returns","negative_percentage","positive_percentage")]
lapply(df_splits,function(xA) {
  splits <- naSplits(xA)
  lapply(splits,function(xB) {
    start <- xB$splits[xB$type=="start"]
    end <- xB$splits[xB$type=="end"]
    values <- xA[start:end]
    list(values=list(values),count=length(values),total=sum(values))
  })
})

请在你的问题中给出一个答案!到目前为止你试过什么?期望的结果是什么?您所做的似乎是为所有非NAs和所有NAs获得一个平均值?我想计算NAs之间的所有时间点。也就是说,假设我有一个数据集NA val1 val2 NA val3 val4 NA,我想单独计算。并在前两个NAs之间保留一行,在第二对NAs之间保留一行。我希望对数据集中的所有NA间隔执行此操作。。希望这对一组有帮助,该组只有一个平均值。您的问题是否更倾向于将列分割为多个单独的集合,由1个或多个NA值分隔?如果是,是否与所有行中的相邻列(如NAs)存在任何关系,或者只是每个列中由NAs分隔的集合?时间栏呢?您的var1列将返回两个平均值,0和-0.004767551,因为这两个集合之间由NA值分隔?是的,我试图获得一行,其中包含每对NAs的总和和元素数。一个最初看起来像NA val1 val2 NA val3 val4 val5 NA的列之后将是NA val6 NA val7 NA,在val6和val7所在的行中,我将有val1和val2和val3以及val4和val5的总和,以及我们在本例中折叠的元素数,分别是2和3根据更新的答案问题的具体说明。
$log_returns
$log_returns$`1`
$log_returns$`1`$values
$log_returns$`1`$values[[1]]
[1] 0
$log_returns$`1`$count
[1] 1
$log_returns$`1`$total
[1] 0 

$log_returns$`2`
$log_returns$`2`$values
$log_returns$`2`$values[[1]]
 [1] -6.015136e-03 -2.069143e-04  2.670102e-05  2.420154e-03  5.008347e-03 -3.331677e-03  1.302135e-03  5.607671e-03  6.797850e-04  3.364216e-04 -7.914784e-05  1.812233e-03
[13]  2.689225e-03
$log_returns$`2`$count
[1] 13    
$log_returns$`2`$total
[1] 0.0102498

$negative_percentage
$negative_percentage$`1`
$negative_percentage$`1`$values
$negative_percentage$`1`$values[[1]]
 [1] 2.208835 2.208835 5.555556 3.598972 3.598972 3.598972 3.598972 3.598972 3.598972 3.598972 3.598972 4.452690 4.452690 1.394422 2.197802
$negative_percentage$`1`$count
[1] 15
$negative_percentage$`1`$total
[1] 51.2626

$positive_percentage
$positive_percentage$`1`
$positive_percentage$`1`$values
$positive_percentage$`1`$values[[1]]
 [1] 2.8112450 2.8112450 3.1746032 0.2570694 0.2570694 0.2570694 0.2570694 0.2570694 0.2570694 0.2570694 0.2570694 0.5565863 0.5565863 0.9960159 1.0989011
$positive_percentage$`1`$count
[1] 15
$positive_percentage$`1`$total
[1] 14.06174