如何在Java项目中使用来自Weka的OneClassClassifier API?
我需要在java项目中实现一个类新颖性检测分类。 我有一个简单的培训集,有一节课:如何在Java项目中使用来自Weka的OneClassClassifier API?,java,weka,data-mining,Java,Weka,Data Mining,我需要在java项目中实现一个类新颖性检测分类。 我有一个简单的培训集,有一节课: @relation oneclass_test @attribute class {target} @attribute text String @data target,'mechanical engineering' target,'engineer' target,'fuel injected engine' target,'suspension' target,'tire-cover' target,
@relation oneclass_test
@attribute class {target}
@attribute text String
@data
target,'mechanical engineering'
target,'engineer'
target,'fuel injected engine'
target,'suspension'
target,'tire-cover'
target,'braking system'
target,'hydraulics'
target,'transmission'
target,'front axle'
target,'spring'
我用这个课程来训练分类器:
public class TestClassifierLearner {
/**
* Object that stores training data.
*/
Instances trainData;
/**
* Object that stores the filter
*/
StringToWordVector filter;
/**
* Object that stores the classifier
*/
FilteredClassifier classifier;
/**
* This method loads a dataset in ARFF format. If the file does not exist, or
* it has a wrong format, the attribute trainData is null.
* @param fileName The name of the file that stores the dataset.
*/
public void loadDataset(String fileName) {
try {
BufferedReader reader = new BufferedReader(new FileReader(fileName));
ArffLoader.ArffReader arff = new ArffLoader.ArffReader(reader);
trainData = arff.getData();
System.out.println("===== Loaded dataset: " + fileName + " =====");
reader.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
/**
* This method evaluates the classifier. As recommended by WEKA documentation,
* the classifier is defined but not trained yet. Evaluation of previously
* trained classifiers can lead to unexpected results.
*/
public void evaluate() {
try {
trainData.setClassIndex(0);
filter = new StringToWordVector();
filter.setAttributeIndices("last");
classifier = new FilteredClassifier();
classifier.setFilter(filter);
OneClassClassifier oneClassClassifier = new OneClassClassifier();
oneClassClassifier.setTargetClassLabel("target");
classifier.setClassifier(oneClassClassifier);
Evaluation eval = new Evaluation(trainData);
eval.crossValidateModel(classifier, trainData, 4, new Random(1));
System.out.println(eval.toSummaryString());
System.out.println(eval.toClassDetailsString());
System.out.println("===== Evaluating on filtered (training) dataset done =====");
}
catch (Exception e) {
e.printStackTrace();
}
}
/**
* This method trains the classifier on the loaded dataset.
*/
public void learn() {
try {
trainData.setClassIndex(0);
filter = new StringToWordVector();
classifier = new FilteredClassifier();
classifier.setFilter(filter);
OneClassClassifier oneClassClassifier = new OneClassClassifier();
oneClassClassifier.setTargetClassLabel("target");
classifier.setClassifier(oneClassClassifier);
classifier.buildClassifier(trainData);
System.out.println("===== Training on filtered (training) dataset done =====");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* This method saves the trained model into a file. This is done by
* simple serialization of the classifier object.
* @param fileName The name of the file that will store the trained model.
*/
public void saveModel(String fileName) {
try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(fileName))) {
out.writeObject(classifier);
System.out.println("===== Saved model: " + fileName + " =====");
} catch (IOException e) {
e.printStackTrace();
}
}
}
和分类类别的一部分:
/**
* This method creates the instance to be classified, from the text that has been read.
*/
public void makeInstance() {
Attribute textAttr = new Attribute("text", (ArrayList<String>) null);
ArrayList<Attribute> fvWekaAttributes = new ArrayList<>(1);
fvWekaAttributes.add(textAttr);
instances = new Instances("TestInstances", fvWekaAttributes, 1);
// Set class index
instances.setClassIndex(0);
// Create and add the instance
DenseInstance instance = new DenseInstance(1);
instance.setValue(textAttr, text);
instances.add(instance);
System.out.println("===== Instance created with reference dataset =====");
System.out.println(instances);
}
/**
* This method performs the classification of the instance.
* Output is done at the command-line.
*/
public void classify() {
try {
instances.instance(0).setClassMissing();
double pred = classifier.classifyInstance(instances.instance(0));
System.out.println("===== Classified instance =====");
System.out.println("Class predicted: " + instances.classAttribute().value((int) pred));
}
catch (Exception e) {
e.printStackTrace();
}
}
===== Loaded text data: C:\projects\1.txt =====
suspension
===== Loaded model: C:\projects\testData.dat =====
===== Instance created with reference dataset =====
@relation TestInstances
@attribute text string
@data
'suspension'
===== Classified instance =====
Class predicted: *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES*
对于任何文本数据,分类器始终为*字符串*属性*和预测NaN返回*WEKA*伪*字符串*。
我做错了什么 这是一个不适定问题。退后一步,重新思考你想做什么。您正在尝试通过SVM模拟文本搜索@这不是文本搜索。我有一门关于文件的课。我需要对其他文档进行分类:若文档属于类,则程序必须输出“目标”和预测值,否则输出“异常值”。这些数据只是示例。您使用它的方式是文本搜索。如果这不是你想要做的,你可能用错了。。。在这个“水力学”数据上,一类SVM没有意义。因为你的核心函数是“液压”的文本。换句话说,它做文本搜索…好的。我有一套文本文档,属于一个类主题。我有一个测试文档。我需要对这个文档目标或异常值进行分类,并得到一个百分比预测。我可以用一个类来做这件事,或者我应该至少有两个类?如果没有,我应该使用什么?它们是包含大量共享单词的真实文本文档,还是类似于您在上面发布的单个单词的东西?一类支持向量机不能分类,它们只能给你一个概率。