Machine learning LibSvm使用javaapi添加特性_Machine Learning_Svm_Libsvm

Machine learning LibSvm使用javaapi添加特性

machine-learning

Machine learning LibSvm使用javaapi添加特性,machine-learning,svm,libsvm,Machine Learning,Svm,Libsvm,我有一个文本，我想通过使用JavaAPI添加特性来进行培训。查看示例，构建训练集的主要类是svm_问题。它看起来像svm_节点代表一个特征（索引是特征，值是特征的权重）我所做的是有一个映射（只是为了简化问题），它在特性和索引之间保持关联。对于我的每个权重>示例，我确实创建了一个新节点： svm_node currentNode = new svm_node(); int index = feature.getIndexInMap(); double value = feature.

我有一个文本，我想通过使用JavaAPI添加特性来进行培训。查看示例，构建训练集的主要类是svm_问题。它看起来像svm_节点代表一个特征（索引是特征，值是特征的权重）

我所做的是有一个映射（只是为了简化问题），它在特性和索引之间保持关联。对于我的每个权重>示例，我确实创建了一个新节点：

  svm_node currentNode = new svm_node();
  int index = feature.getIndexInMap();
  double value = feature.getWeight();
  currentNode.index = index;
  currentNode.value = value;

我的直觉正确吗？svm_问题.y指的是什么？它是指标签的索引吗？是不是svm_问题。l只是两个向量的长度？

你的直觉非常接近，但是svm_节点是一种模式而不是一种特征。变量svm_问题。y是一个数组，包含每个模式和svm_问题的标签。l是训练集的大小

此外，请注意svm_parameter.nr_weight是每个标签的权重（如果训练集不平衡，则很有用），但如果不打算使用它，则必须将该值设置为零

让我向您展示一个简单的C++示例：

#include "svm.h"
#include <iostream>

using namespace std;

int main()
{
    svm_parameter params;


    params.svm_type = C_SVC;
    params.kernel_type = RBF;
    params.C = 1;
    params.gamma = 1;
    params.nr_weight = 0;
    params.p= 0.0001;

    svm_problem problem;
    problem.l = 4;
    problem.y = new double[4]{1,-1,-1,1};
    problem.x = new svm_node*[4];

    {
    problem.x[0] = new svm_node[3];
    problem.x[0][0].index = 1;
    problem.x[0][0].value = 0;
    problem.x[0][1].index = 2;
    problem.x[0][1].value = 0;
    problem.x[0][2].index = -1;

    }

    {
    problem.x[1] = new svm_node[3];
    problem.x[1][0].index = 1;
    problem.x[1][0].value = 1;
    problem.x[1][1].index = 2;
    problem.x[1][1].value = 0;
    problem.x[1][2].index = -1;
    }

    {
    problem.x[2] = new svm_node[3];
    problem.x[2][0].index = 1;
    problem.x[2][0].value = 0;
    problem.x[2][1].index = 2;
    problem.x[2][1].value = 1;
    problem.x[2][2].index = -1;
    }

   {
    problem.x[3] = new svm_node[3];
    problem.x[3][0].index = 1;
    problem.x[3][0].value = 1;
    problem.x[3][1].index = 2;
    problem.x[3][1].value = 1;
    problem.x[3][2].index = -1;

    }

    for(int i=0; i<4; i++)
    {
        cout << problem.y[i] << endl;
    }

    svm_model * model = svm_train(&problem, &params);
    svm_save_model("mymodel.svm", model);

    for(int i=0; i<4; i++)
    {
        double d = svm_predict(model, problem.x[i]);

        cout << "Prediction " << d << endl;
    }
    /* We should free the memory at this point. 
       But this example is large enough already */ 
}

#包括“svm.h”
#包括
使用名称空间std；
int main（）
{
svm_参数参数；
params.svm_type=C_SVC；
params.kernel_type=RBF；
参数C=1；
参数γ=1；
参数nr_权重=0；
参数p=0.0001；
支持向量机问题；
问题l=4；
problem.y=新的双[4]{1，-1，-1,1}；
问题.x=新的svm_节点*[4]；
{
problem.x[0]=新的svm_节点[3]；
问题.x[0][0]。索引=1；
问题.x[0][0]。值=0；
问题.x[0][1]。索引=2；
问题.x[0][1]。值=0；
问题.x[0][2]。索引=-1；
}
{
problem.x[1]=新的svm_节点[3]；
问题.x[1][0]。索引=1；
问题.x[1][0]。值=1；
问题.x[1][1]。索引=2；
问题.x[1][1]。值=0；
问题.x[1][2]。索引=-1；
}
{
problem.x[2]=新的svm_节点[3]；
问题.x[2][0]。索引=1；
问题.x[2][0]。值=0；
问题.x[2][1]。索引=2；
问题.x[2][1]。值=1；
问题.x[2][2]。索引=-1；
}
{
problem.x[3]=新的svm_节点[3]；
问题.x[3][0]。索引=1；
问题.x[3][0]。值=1；
问题.x[3][1]。索引=2；
问题.x[3][1]。值=1；
问题.x[3][2]。索引=-1；
}
对于（int i=0；iI）而言，我建议更改标题。在我看来，这并不代表关于libsvm使用的真正问题，与是否为文本特征的事实关系不大。