Matlab 将数据集分为训练数据集和测试数据集_Matlab_Validation_Testing_Training Data_Data Partitioning

Matlab 将数据集分为训练数据集和测试数据集

matlab validation testing

Matlab 将数据集分为训练数据集和测试数据集,matlab,validation,testing,training-data,data-partitioning,Matlab,Validation,Testing,Training Data,Data Partitioning,我有两个图像数据集：受试者1-200，每个受试者都有c，例如c=8张图像。现在，我想将这两个数据集划分为算法的训练集和测试集。我通常希望在以下情况下这样做：所需案例案例1随机选择k1图像k1您似乎缺少一条重要信息。您的数据是如何表示的？还有，你是如何表现地面真相的？它们是细胞阵列吗？二维还是三维矩阵？在了解您的数据结构之前，我们无法提出建议。还说我想要MATLAB中的这段代码意味着你想让我们为你写这段代码，而你没有表现出任何努力。我认为这是一个有趣的问题，但其他人不愿意为解决你的问题付出任何

我有两个图像数据集：受试者1-200，每个受试者都有c，例如c=8张图像。现在，我想将这两个数据集划分为算法的训练集和测试集。我通常希望在以下情况下这样做：

所需案例

案例1随机选择k1图像k1您似乎缺少一条重要信息。您的数据是如何表示的？还有，你是如何表现地面真相的？它们是细胞阵列吗？二维还是三维矩阵？在了解您的数据结构之前，我们无法提出建议。还说我想要MATLAB中的这段代码意味着你想让我们为你写这段代码，而你没有表现出任何努力。我认为这是一个有趣的问题，但其他人不愿意为解决你的问题付出任何努力。@rayryeng我不理解你关于基本事实的问题。请澄清。我大大扩展了问题和所需的案例。我已经发布了案例1的代码片段。现在这个问题可以理解了吗？“如果需要其他更改，请指出我的位置。”我为案例2添加了rayryeng。请告诉我我的方法是否正确。我似乎不太明白第三种情况。我去看看工作吧！

%% Read the data
%% My data reads as follows:
Name            Size            Bytes  Class     Attributes

a_data         99x1             12672  cell                
a_labels        1x99              792  double              
c               1x1                 8  double              
card_a         11x2               176  double              
unq_a_lab       1x11               88  double             

% where a_data is my total dataset. 
% Assume that it contains total 99 images. 
% a_labels is the labels associated with the images. 
% c is the minimum number of subjects present in a class 
% c is calculated as min (card(subj1),card(subj2),.....)
% card_a is the cardinality of each class present in the database
% card_a = [1,2,3,4,......;10,9,11,9,.....] i.e. card of subj 1 = 10
% card of subj 2 = 9 ,...etc
% unq_a_labels : Number of unique subjects present in the database. 
% Assume it to be 11 (as given).

%% CASE 1 COMPLETELY OVERLAPPING DATASET EQUAL SIZED PARTITIONS
% Split the dataset into randomly training and testing subsets 
% trainset - each subject k1 images
% testset - eact subject k2 images
% bear in mind constraint : k1+k2<=c
% Total training set = k1*no. of subjects
% Total testing set = k2*no. of subjects
% Both training and testing sets (subjects) are completely overlapping

%split 1 
k1 = 3;
%split 2
k2 = 3;

Train_data_a = cell(length(unq_a_lab)*k1,1);
Test_data_a = cell(length(unq_a_lab)*k2,1);
tr_a_labels = zeros(1,length(unq_a_lab)*k1);
tst_a_labels = zeros(1,length(unq_a_lab)*k2);

t1=0; t2=0;
for i=1:length(unq_a_lab)
    id = randperm(c);
    % split it into 1:k1 and k1+1:k2 points
    for j=1:k1
        Train_data_a{t1+j} = a_data{c*(i-1)+id(j)};
        tr_a_labels(1,t1+j) = a_labels(c*(i-1)+id(j));
    end
    for j=1:k2
        Test_data_a{t2+j} = a_data{c*(i-1)+id(j+k1)};
        tst_a_labels(1,t2+j) = a_labels(c*(i-1)+id(j+k1));        
    end
    t1 = t1+k1; t2 = t2+k2;
end

%% CASE 2 COMPLETELY NON-OVERLAPPING DATASETS EQUAL SIZED PARTITIONS
% Split the dataset into randomly training and testing subsets 
% trainset - each subject k1 images
% testset - eact subject k2 images
% Total training set = k1* cardinality of Train Set
% Total testing set = k2* cardinality of Test Set
% cardinality of Train Set + cardinality of Test Set = Total cardinality of
% the database
% Both training and testing sets (subjects) are non-overlapping
% p1 = number of subjects in training set
% p2 = number of subjects in testing set

%split 1 
k1 = 3;
%split 2
k2 = 3;
% size of the partitions
% p1 = number of classes in the training sets
% p2 = number of classes in the testing sets
size_p = length(unq_a_lab);
p1 = round((size_p-1)*rand);
p2 = size_p-p1;

Train_data_a = cell(p1*k1,1);
Test_data_a = cell(p2*k2,1);
tr_a_labels = zeros(1,p1*k1);
tst_a_labels = zeros(1,p2*k2);
t1=0; t2=0;
for i=1:length(unq_a_lab)
    id = randperm(c);
    % split it into 1:k1 and 1:k2 points
    if i<=p1
        for j=1:k1
            Train_data_a{t1+j} = a_data{c*(i-1)+id(j)};
            tr_a_labels(1,t1+j) = a_labels(c*(i-1)+id(j));            
        end
        t1 = t1+k1;
    end
    
    if i>p1
        for j=1:k2
            Test_data_a{t2+j} = a_data{c*(i-1)+id(j)};
            tst_a_labels(1,t2+j) = a_labels(c*(i-1)+id(j));                    
        end
        t2 = t2+k2;
    end
end

%split 1
k1 = 3;
%split 2
k2 = 3;
% size of the partitions
% p1 = number of classes in the training sets
% p2 = number of classes in the testing sets
size_p = length(unq_a_lab);
p1 = round((size_p-1)*rand);
p2 = size_p-p1;

Train_data_a = cell(p1*k1,1);
Test_data_a = cell(p2*k2,1);
tr_a_labels = zeros(1,p1*k1);
tst_a_labels = zeros(1,p2*k2);
x = randperm(length(unq_a_lab));
t1=0; t2=0;
for i=1:length(unq_a_lab)
    id = randperm(c);
    % split it into 1:k1 and 1:k2 points
    if i<=p1
        for j=1:k1
            Train_data_a{t1+j} = a_data{c*(x(i)-1)+id(j)};
            tr_a_labels(1,t1+j) = a_labels(c*(x(i)-1)+id(j));
        end
        t1 = t1+k1;
    end    
    if i>p1
        for j=1:k2
            Test_data_a{t2+j} = a_data{c*(x(i)-1)+id(j)};
            tst_a_labels(1,t2+j) = a_labels(c*(x(i)-1)+id(j));
        end
        t2 = t2+k2;
    end
end

%% CASE 3 COMPLETELY NON OVERLAPPING DATASETS UNEQUAL SIZED PARTITIONS
%% Split the dataset into randomly training and testing subsets
% trainset - Total m images and each subject atleast having i=floor(m/p1) images
% testset - eact subject k2 images
% Total training set = m images
% Total testing set = k2*p2 images
% cardinality of Train Set + cardinality of Test Set = Total cardinality of
% the database
% Both training and testing sets (subjects) are non-overlapping

% size of the partitions
% p1 = number of classes in the training sets
% p2 = number of classes in the testing sets
size_p = length(unq_a_lab);
% p1 = round((size_p-1)*rand);
p1 = 6;
p2 = size_p-p1;

%split 1
m = 29;
min_reqd = floor(m/p1);
%split 2
k2 = 3;

Train_data_a = cell(m,1);
Test_data_a = cell(p2*k2,1);
tr_a_labels = zeros(1,m);
dummy_labels = tr_a_labels;
tst_a_labels = zeros(1,p2*k2);
x = randperm(length(unq_a_lab));
% filling up the first min_reqd for each class
t1=1;
for j=1:p1
    idx = randperm(c);
    idx = idx(1:min_reqd);
    for k=1:min_reqd
        dummy_labels(t1) = c*(x(j)-1)+idx(k);
        t1 = t1+1;
    end
end
% form the numberset
num_pack = zeros(1,c*p1);
t2=1;
for j=1:p1
    for k=1:c
        num_pack(1,t2) = c*(x(j)-1)+k;
        t2 = t2+1;
    end
end
% getting the indices that have not been already selected previously
% using the set difference operation
% setdiff(A,B) is the values of A that are not in B
new_a_labels = setdiff(num_pack,dummy_labels);
idx = randperm(length(new_a_labels));
% randomly selecting the left amount of values from the set difference
% subset
idx = new_a_labels(idx(1:m-(min_reqd*p1)));
% inserting the values into the matrix
dummy_labels(t1:t1+length(idx)-1) = idx;
% sorting the matrix
[val,idx] = sort(dummy_labels);
% rearranging the matrix
dummy_labels = dummy_labels(idx);

% using the indices of the dummy variables to get the training set and 
% their corresponding labels
for i=1:m
    Train_data_a{i} = a_data{dummy_labels(i)};
    tr_a_labels(1,i) = a_labels(dummy_labels(i));
end

% getting the testing set as previously done in case 2
t2=0;
for i=1:length(unq_a_lab)
    % Random selection of k2 points for the testing set
    id = randperm(c);
    if i>p1
        for j=1:k2
            Test_data_a{t2+j} = a_data{c*(x(i)-1)+id(j)};
            tst_a_labels(1,t2+j) = a_labels(c*(x(i)-1)+id(j));
        end
        t2 = t2+k2;
    end
end*