Arrays MATLAB中的大数据数组处理

Arrays MATLAB中的大数据数组处理,arrays,matlab,large-data,Arrays,Matlab,Large Data,我有一个数组中的大数据集。数据量真的很大,这是一些单元格维度-5是,11是 我正在尝试执行重采样操作,我有一个功能。(功能代码如下所示)。我试图从数组中取出整个单元格,执行重采样操作,并将结果存储回同一数组位置或其他位置 但是,在第19行或重采样函数中出现以下错误- “使用零时出错 超出了程序允许的最大变量大小。 重采样错误(第19行) obj=零(t,1); 我在注释第19行时遇到内存不足错误 请问有没有更有效的方法来操作如此大的数据集 多谢各位 实际代码: %% To load each "

我有一个数组中的大数据集。数据量真的很大,这是一些单元格维度-5是,11是

我正在尝试执行重采样操作,我有一个功能。(功能代码如下所示)。我试图从数组中取出整个单元格,执行重采样操作,并将结果存储回同一数组位置或其他位置

但是,在第19行或重采样函数中出现以下错误-

“使用零时出错 超出了程序允许的最大变量大小。 重采样错误(第19行) obj=零(t,1);

我在注释第19行时遇到内存不足错误

请问有没有更有效的方法来操作如此大的数据集

多谢各位

实际代码:

%% To load each ".dat" file for the 51 attributes to an array.

a = dir('*.dat');

for i = 1:length(a)
eval(['load ' a(i).name ' -ascii']);
end

attributes = length(a);

% Scan folder for number of ".dat" files
datfiles = dir('*.dat'); 

% Count Number of ".dat" files
numfiles = length(datfiles); 

% Read files in to MATLAB
for i = 1:1:numfiles
    A{i} = csvread(datfiles(i).name);
end

% Remove discarded variables
ind = [1 22 23 24 25 26 27 32]; % Variables to be removed.
A(ind) = [];

% Reshape all the data into columns - (n x 1) 
for i = 1:1:length(A)
    temp = A{1,i};
    [x,y] = size(temp);
    if x == 1 && y ~= 1
        temp = temp';
        A{1,i} = temp;
    end
end

% Retrieves the frequency data for the attributes from Excel spreadsheet
frequency = xlsread('C:\Users\aajwgc\Documents\MATLAB\Research Work\Data\testBig\frequency');

% Removing recorded frequency for discarded variables
frequency(ind) = [];

% Upsampling all the attributes to desired frequency
prompt = {'Frequency (Hz):'};
dlg_title = 'Enter desired output frequency for all attributes';
num_lines = 1;
def = {'50'};
answer= inputdlg(prompt,dlg_title,num_lines,def);
OutFreq = str2num(answer{1});

m = 1; 
n = length(frequency);
A_resampled = cell(m,n);
A_resampled(:) = {''};

for i = length(frequency);
    raw = cell2mat(A(1,i));
    temp= Resample(raw, frequency(i,:), OutFreq);
     A_resampled{i} = temp(i);
end
function obj = Resample(InputData, InFreq, OutFreq, varargin)
%% Preliminary setup
% Allow for selective down-sizing by specifying type
type = 'mean'; %default to the mean/average

if size(varargin,2) > 0
    type = varargin{1};
end

% Determine the necessary resampling factor
factor = OutFreq / InFreq;

%% No refactoring required
if (factor == 1)
    obj = InputData;
%% Up-Sampling required
elseif (factor > 1)
    t = factor * numel(InputData(1:end));
    **obj = zeros(t,1); ----------------> Line 19 where I get the error message.**

    for i = 1:factor:t
        y = ((i-1) / factor) + 1;
        z = InputData(y);
        obj(i:i+factor) = z;
    end
%% Down-Sampling required
elseif (factor < 1)    
    t = numel(InputData(1:end));
    t = floor(t * factor);
    obj = zeros(t,1);
    factor = int32(1/factor);

    if  strcmp(type,'mean') %default is mean (process first)
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = mean(InputData(y:y+factor-1));
        end    
    elseif strcmp(type,'min')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = min(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'max')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = max(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'mode')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = mode(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'sum')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = sum(InputData(y:y+factor-1));
        end   
    elseif strcmp(type,'single')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = InputData(y);
        end
    else
        obj = NaN;
    end
else
    obj = NaN;
end
重采样功能:

%% To load each ".dat" file for the 51 attributes to an array.

a = dir('*.dat');

for i = 1:length(a)
eval(['load ' a(i).name ' -ascii']);
end

attributes = length(a);

% Scan folder for number of ".dat" files
datfiles = dir('*.dat'); 

% Count Number of ".dat" files
numfiles = length(datfiles); 

% Read files in to MATLAB
for i = 1:1:numfiles
    A{i} = csvread(datfiles(i).name);
end

% Remove discarded variables
ind = [1 22 23 24 25 26 27 32]; % Variables to be removed.
A(ind) = [];

% Reshape all the data into columns - (n x 1) 
for i = 1:1:length(A)
    temp = A{1,i};
    [x,y] = size(temp);
    if x == 1 && y ~= 1
        temp = temp';
        A{1,i} = temp;
    end
end

% Retrieves the frequency data for the attributes from Excel spreadsheet
frequency = xlsread('C:\Users\aajwgc\Documents\MATLAB\Research Work\Data\testBig\frequency');

% Removing recorded frequency for discarded variables
frequency(ind) = [];

% Upsampling all the attributes to desired frequency
prompt = {'Frequency (Hz):'};
dlg_title = 'Enter desired output frequency for all attributes';
num_lines = 1;
def = {'50'};
answer= inputdlg(prompt,dlg_title,num_lines,def);
OutFreq = str2num(answer{1});

m = 1; 
n = length(frequency);
A_resampled = cell(m,n);
A_resampled(:) = {''};

for i = length(frequency);
    raw = cell2mat(A(1,i));
    temp= Resample(raw, frequency(i,:), OutFreq);
     A_resampled{i} = temp(i);
end
function obj = Resample(InputData, InFreq, OutFreq, varargin)
%% Preliminary setup
% Allow for selective down-sizing by specifying type
type = 'mean'; %default to the mean/average

if size(varargin,2) > 0
    type = varargin{1};
end

% Determine the necessary resampling factor
factor = OutFreq / InFreq;

%% No refactoring required
if (factor == 1)
    obj = InputData;
%% Up-Sampling required
elseif (factor > 1)
    t = factor * numel(InputData(1:end));
    **obj = zeros(t,1); ----------------> Line 19 where I get the error message.**

    for i = 1:factor:t
        y = ((i-1) / factor) + 1;
        z = InputData(y);
        obj(i:i+factor) = z;
    end
%% Down-Sampling required
elseif (factor < 1)    
    t = numel(InputData(1:end));
    t = floor(t * factor);
    obj = zeros(t,1);
    factor = int32(1/factor);

    if  strcmp(type,'mean') %default is mean (process first)
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = mean(InputData(y:y+factor-1));
        end    
    elseif strcmp(type,'min')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = min(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'max')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = max(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'mode')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = mode(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'sum')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = sum(InputData(y:y+factor-1));
        end   
    elseif strcmp(type,'single')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = InputData(y);
        end
    else
        obj = NaN;
    end
else
    obj = NaN;
end
函数obj=重采样(输入数据、InFreq、OUTPREQ、varargin)
%%初步设置
%通过指定类型允许选择性缩小尺寸
类型=“平均值”;%默认为平均值/平均值
如果大小(varargin,2)>0
类型=varargin{1};
结束
%确定必要的重采样系数
系数=输出请求/输入请求;
%%不需要重构
如果(系数==1)
obj=输入数据;
%%需要向上取样
elseif(系数>1)
t=系数*numel(输入数据(1:end));
**obj=zeros(t,1);----------------------->第19行,在那里我得到了错误消息**
对于i=1:系数:t
y=((i-1)/系数)+1;
z=输入数据(y);
obj(i:i+系数)=z;
结束
%%需要向下取样
elseif(系数<1)
t=numel(输入数据(1:end));
t=地板(t*系数);
obj=零(t,1);
系数=int32(1/系数);
如果strcmp(类型,'mean')%,默认值为mean(先处理)
对于i=1:t
y=(系数*(i-1))+1;
obj(i)=平均值(输入数据(y:y+因子-1));
结束
其他strcmp(类型,“最小”)
对于i=1:t
y=(系数*(i-1))+1;
obj(i)=最小值(输入数据(y:y+系数-1));
结束
elseif strcmp(类型“max”)
对于i=1:t
y=(系数*(i-1))+1;
obj(i)=最大值(输入数据(y:y+系数-1));
结束
elseif strcmp(类型,“模式”)
对于i=1:t
y=(系数*(i-1))+1;
obj(i)=模式(输入数据(y:y+因子-1));
结束
elseif strcmp(类型,'sum')
对于i=1:t
y=(系数*(i-1))+1;
obj(i)=总和(输入数据(y:y+因子-1));
结束
其他strcmp(类型,'single')
对于i=1:t
y=(系数*(i-1))+1;
obj(i)=输入数据(y);
结束
其他的
obj=NaN;
结束
其他的
obj=NaN;
结束

如果您有DSP系统工具箱,您可以使用例如DSP.FIRInterpolator系统对象()并重复调用其step()函数,以避免一次性处理所有数据

顺便说一句,上/下采样(插值和抽取)是比您假设的更复杂的概念;在最普遍的意义上,它们都需要某种形式的过滤来消除这些过程产生的伪影


<>你可以自行设计这些滤波器并将信号与它们卷积,但是做这种滤波器设计需要在信号处理中有坚实的基础。如果你想走这条路线,我建议在没有参考文本的情况下从某个地方拾取一本教科书很容易出错。

看来你正在通过一个单元重新采样。t等于最大单元大小=,这意味着10.5 MB内存。您不应该出现这样的错误。在您的情况下,当您出现错误时,t的值是多少?还可以通过在MATLAB的命令行中键入memory来检查内存使用情况。Parag,t计算重采样操作将产生的元素数。它没有运行过第一次迭代ion和t为45875200。然后
zero(t,1)
将产生约350MB内存。在MATAB命令窗口中键入
memory
,看看您是否有那么多内存。另一个选项是,如果要在
obj
中存储无符号8位整数,您可以写入
零(t,1,'uint8')
,或
零(t,1,'single)
。随便你喜欢什么。我有大约2910MB的内存,我想知道为什么它不工作。另外,我试图避免在使用csvread读取数据时使用eval函数。但是csvread读取文件a01并移动到文件a20,而不是a01、a02、a03,…我如何解决这个问题?数据以“.dat”格式提供