如何在Java项目中使用来自Weka的OneClassClassifier API？_Java_Weka_Data Mining

如何在Java项目中使用来自Weka的OneClassClassifier API？

java

如何在Java项目中使用来自Weka的OneClassClassifier API？,java,weka,data-mining,Java,Weka,Data Mining,我需要在java项目中实现一个类新颖性检测分类。我有一个简单的培训集，有一节课： @relation oneclass_test @attribute class {target} @attribute text String @data target,'mechanical engineering' target,'engineer' target,'fuel injected engine' target,'suspension' target,'tire-cover' target,

我需要在java项目中实现一个类新颖性检测分类。我有一个简单的培训集，有一节课：

@relation oneclass_test

@attribute class {target}
@attribute text String

@data
target,'mechanical engineering'
target,'engineer'
target,'fuel injected engine'
target,'suspension'
target,'tire-cover'
target,'braking system'
target,'hydraulics'
target,'transmission'
target,'front axle'
target,'spring'

我用这个课程来训练分类器：

public class TestClassifierLearner {
/**
 * Object that stores training data.
 */
Instances trainData;

/**
 * Object that stores the filter
 */
StringToWordVector filter;

/**
 * Object that stores the classifier
 */
FilteredClassifier classifier;

/**
 * This method loads a dataset in ARFF format. If the file does not exist, or
 * it has a wrong format, the attribute trainData is null.
 * @param fileName The name of the file that stores the dataset.
 */
public void loadDataset(String fileName) {
    try {
        BufferedReader reader = new BufferedReader(new FileReader(fileName));
        ArffLoader.ArffReader arff = new ArffLoader.ArffReader(reader);
        trainData = arff.getData();
        System.out.println("===== Loaded dataset: " + fileName + " =====");
        reader.close();
    }
    catch (IOException e) {
        e.printStackTrace();
    }
}

/**
 * This method evaluates the classifier. As recommended by WEKA documentation,
 * the classifier is defined but not trained yet. Evaluation of previously
 * trained classifiers can lead to unexpected results.
 */
public void evaluate() {
    try {
        trainData.setClassIndex(0);
        filter = new StringToWordVector();
        filter.setAttributeIndices("last");
        classifier = new FilteredClassifier();
        classifier.setFilter(filter);

        OneClassClassifier oneClassClassifier = new OneClassClassifier();
        oneClassClassifier.setTargetClassLabel("target");
        classifier.setClassifier(oneClassClassifier);

        Evaluation eval = new Evaluation(trainData);
        eval.crossValidateModel(classifier, trainData, 4, new Random(1));
        System.out.println(eval.toSummaryString());
        System.out.println(eval.toClassDetailsString());
        System.out.println("===== Evaluating on filtered (training) dataset done =====");
    }
    catch (Exception e) {
        e.printStackTrace();
    }
}

/**
 * This method trains the classifier on the loaded dataset.
 */
public void learn() {
    try {
        trainData.setClassIndex(0);
        filter = new StringToWordVector();
        classifier = new FilteredClassifier();
        classifier.setFilter(filter);

        OneClassClassifier oneClassClassifier = new OneClassClassifier();
        oneClassClassifier.setTargetClassLabel("target");

        classifier.setClassifier(oneClassClassifier);

        classifier.buildClassifier(trainData);
        System.out.println("===== Training on filtered (training) dataset done =====");
    } catch (Exception e) {
        e.printStackTrace();
    }
}

/**
 * This method saves the trained model into a file. This is done by
 * simple serialization of the classifier object.
 * @param fileName The name of the file that will store the trained model.
 */
public void saveModel(String fileName) {
    try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(fileName))) {
        out.writeObject(classifier);
        System.out.println("===== Saved model: " + fileName + " =====");
    } catch (IOException e) {
        e.printStackTrace();
    }
}
}

和分类类别的一部分：

/**
 * This method creates the instance to be classified, from the text that has been read.
 */
public void makeInstance() {
    Attribute textAttr = new Attribute("text", (ArrayList<String>) null);
    ArrayList<Attribute> fvWekaAttributes = new ArrayList<>(1);
    fvWekaAttributes.add(textAttr);

    instances = new Instances("TestInstances", fvWekaAttributes, 1);
    // Set class index
    instances.setClassIndex(0);

    // Create and add the instance
    DenseInstance instance = new DenseInstance(1);
    instance.setValue(textAttr, text);
    instances.add(instance);

    System.out.println("===== Instance created with reference dataset =====");
    System.out.println(instances);
}
/**
 * This method performs the classification of the instance.
 * Output is done at the command-line.
 */
public void classify() {
    try {
        instances.instance(0).setClassMissing();
        double pred = classifier.classifyInstance(instances.instance(0));
        System.out.println("===== Classified instance =====");
        System.out.println("Class predicted: " + instances.classAttribute().value((int) pred));
    }
    catch (Exception e) {
        e.printStackTrace();
    }
}

===== Loaded text data: C:\projects\1.txt =====
suspension
===== Loaded model: C:\projects\testData.dat =====

===== Instance created with reference dataset =====
@relation TestInstances
@attribute text string

@data
'suspension'

===== Classified instance =====
Class predicted: *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES*

对于任何文本数据，分类器始终为*字符串*属性*和预测NaN返回*WEKA*伪*字符串*。

我做错了什么

这是一个不适定问题。退后一步，重新思考你想做什么。您正在尝试通过SVM模拟文本搜索@这不是文本搜索。我有一门关于文件的课。我需要对其他文档进行分类：若文档属于类，则程序必须输出“目标”和预测值，否则输出“异常值”。这些数据只是示例。您使用它的方式是文本搜索。如果这不是你想要做的，你可能用错了。。。在这个“水力学”数据上，一类SVM没有意义。因为你的核心函数是“液压”的文本。换句话说，它做文本搜索…好的。我有一套文本文档，属于一个类主题。我有一个测试文档。我需要对这个文档目标或异常值进行分类，并得到一个百分比预测。我可以用一个类来做这件事，或者我应该至少有两个类？如果没有，我应该使用什么？它们是包含大量共享单词的真实文本文档，还是类似于您在上面发布的单个单词的东西？一类支持向量机不能分类，它们只能给你一个概率。