R 聚合大数据帧_R - Fatal编程技术网

R 聚合大数据帧

R 聚合大数据帧,r,R,我有一个大数据框（>1.000.000个条目），其中一列包含日期/时间变量，另一列包含数值。问题是一些日期/时间变量出现了两次或三次，并且需要对相应的数值进行平均，因此我最终为每个日期/时间变量提供一个数值到目前为止，我正在做以下工作： ## audio_together is the dataframe with two colums $timestamp and $amplitude ## (i.e. the numeric value) timestamp_unique <-

我有一个大数据框（>1.000.000个条目），其中一列包含日期/时间变量，另一列包含数值。问题是一些日期/时间变量出现了两次或三次，并且需要对相应的数值进行平均，因此我最终为每个日期/时间变量提供一个数值

到目前为止，我正在做以下工作：

## audio_together is the dataframe with two colums $timestamp and $amplitude 
## (i.e. the numeric value)

timestamp_unique <- unique(audio_together$timestamp)   ## find all timestamps
  audio_together3 <- c(rep(NA, length(timestamp_unique)))  ## audio_together 3 is the new vector containing the values for each timestamp
  count = 0
  for (k in 1:length(timestamp_unique)){
    temp_time <- timestamp_unique[k]
    if (k==1){
      temp_subset <- audio_together[(1:10),]  ## look for timestamps only in a subset, which definitely contains the timestamp we are looking for
      temp_data_which <- which(temp_subset$timestamp == temp_time)
    } else {
      temp_subset <- audio_together[((count):(count+9)),]
      temp_data_which <- which(temp_subset$timestamp == temp_time)
    }
    if (length(temp_data_which) > 1){
      audio_together3[k] <- mean(temp_subset$amplitude[temp_data_which], na.rm = T)
    } else {
      audio_together3[k] <- temp_subset$amplitude[temp_data_which]
    }
    count <- count + length(temp_data_which)
  }

##audio#u合在一起是具有两列$timestamp和$ampligum的数据帧
##（即数值）
时间戳_unique如果没有of很难进行测试，但如果您的目的是平均所有振幅
共享相同的时间戳
，则此dplyr
解决方案可能有助于：
library(dplyr)
audio_together %>% 
  group_by(timestamp) %>% 
  summarize(av_amplitude=mean(amplitude, na.rm=T)) %>% 
  ungroup()

没有of很难进行测试，但如果您打算平均所有共享相同时间戳的振幅
，则此dplyr
解决方案可能会有所帮助：
library(dplyr)
audio_together %>% 
  group_by(timestamp) %>% 
  summarize(av_amplitude=mean(amplitude, na.rm=T)) %>% 
  ungroup()

谢谢你的想法
以下方法非常有效：
require(dplyr)
audio_together <- audio_together %>% group_by(timestamp)
audio_together <- ungroup(audio_together %>% summarise(mean(amplitude, na.rm=T)))

require（dplyr）
音频\u一起%group\u by（时间戳）
音频汇总百分比（平均值（振幅，na.rm=T）））
谢谢你的想法
以下方法非常有效：
require(dplyr)
audio_together <- audio_together %>% group_by(timestamp)
audio_together <- ungroup(audio_together %>% summarise(mean(amplitude, na.rm=T)))

require（dplyr）
音频\u一起%group\u by（时间戳）
音频汇总百分比（平均值（振幅，na.rm=T）））
您能否提供数据和预期输出的小样本？像您想要的分组一样，可以通过多种方式进行处理：taply
、ave
和aggregate
在base R中。data.table

和

dplyr

包很可能提供所需的速度。

库（data.table）；setDT（音频_在一起）；audio_合在一起[，（振幅=平均值（振幅，na.rm=真）），by=时间戳]

您检查过了吗？您能提供数据和预期输出的小样本吗？像您想要的分组一样，可以通过多种方式进行处理：

taply

、

ave

和

aggregate

在base R中。data.table和

dplyr

包很可能提供所需的速度。

库（data.table）；setDT（音频_在一起）；音频（振幅=平均值（振幅，na.rm=真）），by=时间戳]

您检查了吗？