有没有办法将Weka j48决策树输出映射到RDF格式?
我想使用基于Weka j48决策树输出的Jena创建一个本体。但在将其输入到Jena之前,需要将该输出映射到RDF格式。有什么方法可以进行这种映射吗 EDIT1: 映射前j48决策树输出的示例部分: 决策树输出对应的RDF样本部分: 这两个屏幕来自本研究论文(幻灯片4):有没有办法将Weka j48决策树输出映射到RDF格式?,rdf,weka,jena,ontology,decision-tree,Rdf,Weka,Jena,Ontology,Decision Tree,我想使用基于Weka j48决策树输出的Jena创建一个本体。但在将其输入到Jena之前,需要将该输出映射到RDF格式。有什么方法可以进行这种映射吗 EDIT1: 映射前j48决策树输出的示例部分: 决策树输出对应的RDF样本部分: 这两个屏幕来自本研究论文(幻灯片4): 可能没有内置的方法来实现这一点 免责声明:我以前从未与Jena和RDF合作过。因此,这个答案可能不完整,或者没有达到预期转换的目的 但无论如何,首先是一句简短的咆哮: 论文中发表的代码片段(即Weka分类器和RDF的
可能没有内置的方法来实现这一点 免责声明:我以前从未与Jena和RDF合作过。因此,这个答案可能不完整,或者没有达到预期转换的目的 但无论如何,首先是一句简短的咆哮:
论文中发表的代码片段(即Weka分类器和RDF的输出)不完整且明显不一致。转换过程根本没有描述。相反,他们只提到:
我们面临的挑战主要是将J48分类输出给RDF,并将其交给Jena
(原文如此!)
现在,他们设法解决了这个问题。他们本可以在公开的开源存储库中提供转换代码。这将允许其他人提供改进,并将提高其方法的可见性和可验证性。但是,相反,他们浪费了时间和读者的时间,用各种网站的截图作为页面填充,可怜地试图从他们的方法中挤出另一份出版物
以下是我尽力提供转换所需的一些构建块的方法。必须对它持保留态度,因为我不熟悉底层的方法和库。不过,我希望它可以被视为“有用” Weka
分类器
实现通常不提供用于内部工作的结构。因此,不可能直接访问内部树结构。但是,有一个方法返回树的字符串表示形式
下面的代码包含一个非常实用的方法(因此有些脆弱),该方法解析这个字符串并构建一个包含相关信息的树结构。此结构由TreeNode
对象组成:
static class TreeNode
{
String label;
String attribute;
String relation;
String value;
...
}
是用于分类器的类标签。对于叶节点,这仅为非空。对于本文中的示例,这将是标签
或“0”
,指示电子邮件是否为垃圾邮件“1”
属性是决策所基于的属性。对于本文中的示例,这样的属性可以是
word\u freq\u remove
和关系
是表示决策标准的字符串。这些可能是值
“您能否提供一个示例,说明您的预期输出RDF应该是什么样子?@Marco13请检查编辑。1)您需要本体,即模式2)为Weka编写您自己的导出程序,很明显,没有内置的或3)编写一个从决策树字符串到RDF的转换器。或者先导出/转换为XML或JSON。很抱歉响应太晚。非常感谢你!非常感谢您的努力。@MohamedELTair您不必接受答案。也许有人能提供一个更完整的解决方案。还是你现在就完全解决了?我没有机会在Jena中尝试RDF输出,所以我仍然怀疑它是否是一种真正合理的格式……我对RDF是新手。但据我所知,输出RDF的格式在我看来是正确的。我运行了整个代码,它按预期工作。现在我需要在RDF上运行查询来对测试数据进行分类。因此,我很容易创建一个查询函数来对RDF运行查询,毕竟它就像一棵树。再次感谢。顺便说一句,我已经使用以下链接检查了代码生成的多个RDF:。所有这些都是有效的,并且三元组和图形都成功生成。@MohamedELTair感谢您的反馈。我发贴这封信是在冒险。所以很高兴听到你确实觉得它很有用。
import java.io.FileInputStream; import java.util.ArrayList; import java.util.List; import org.apache.jena.rdf.model.Model; import org.apache.jena.rdf.model.ModelFactory; import org.apache.jena.rdf.model.Property; import org.apache.jena.rdf.model.Resource; import org.apache.jena.rdf.model.Statement; import weka.classifiers.trees.J48; import weka.core.Instances; import weka.core.converters.ArffLoader; public class WekaClassifierToRdf { public static void main(String[] args) throws Exception { String fileName = "./data/iris.arff"; ArffLoader arffLoader = new ArffLoader(); arffLoader.setSource(new FileInputStream(fileName)); Instances instances = arffLoader.getDataSet(); instances.setClassIndex(4); //System.out.println(instances); J48 classifier = new J48(); classifier.buildClassifier(instances); System.out.println(classifier); String prefixTreeString = classifier.prefix(); TreeNode node = processPrefixTreeString(prefixTreeString); System.out.println("Tree:"); System.out.println(node.createString()); Model model = createModel(node); System.out.println("Model:"); model.write(System.out, "RDF/XML-ABBREV"); } private static TreeNode processPrefixTreeString(String inputString) { String string = inputString.replaceAll("\\n", ""); //System.out.println("Input is " + string); int open = string.indexOf("["); int close = string.lastIndexOf("]"); String part = string.substring(open + 1, close); //System.out.println("Part " + part); int colon = part.indexOf(":"); if (colon == -1) { TreeNode node = new TreeNode(); int openAfterLabel = part.lastIndexOf("("); String label = part.substring(0, openAfterLabel).trim(); node.label = label; return node; } String attributeName = part.substring(0, colon); //System.out.println("attributeName " + attributeName); int comma = part.indexOf(",", colon); int leftOpen = part.indexOf("[", comma); String leftCondition = part.substring(colon + 1, comma).trim(); String rightCondition = part.substring(comma + 1, leftOpen).trim(); int leftSpace = leftCondition.indexOf(" "); String leftRelation = leftCondition.substring(0, leftSpace).trim(); String leftValue = leftCondition.substring(leftSpace + 1).trim(); int rightSpace = rightCondition.indexOf(" "); String rightRelation = rightCondition.substring(0, rightSpace).trim(); String rightValue = rightCondition.substring(rightSpace + 1).trim(); //System.out.println("leftCondition " + leftCondition); //System.out.println("rightCondition " + rightCondition); int leftClose = findClosing(part, leftOpen + 1); String left = part.substring(leftOpen, leftClose + 1); //System.out.println("left " + left); int rightOpen = part.indexOf("[", leftClose); int rightClose = findClosing(part, rightOpen + 1); String right = part.substring(rightOpen, rightClose + 1); //System.out.println("right " + right); TreeNode leftNode = processPrefixTreeString(left); leftNode.relation = leftRelation; leftNode.value = leftValue; TreeNode rightNode = processPrefixTreeString(right); rightNode.relation = rightRelation; rightNode.value = rightValue; TreeNode result = new TreeNode(); result.attribute = attributeName; result.children.add(leftNode); result.children.add(rightNode); return result; } private static int findClosing(String string, int startIndex) { int stack = 0; for (int i=startIndex; i<string.length(); i++) { char c = string.charAt(i); if (c == '[') { stack++; } if (c == ']') { if (stack == 0) { return i; } stack--; } } return -1; } static class TreeNode { String label; String attribute; String relation; String value; List<TreeNode> children = new ArrayList<TreeNode>(); String createString() { StringBuilder sb = new StringBuilder(); createString("", sb); return sb.toString(); } private void createString(String indent, StringBuilder sb) { if (children.isEmpty()) { sb.append(indent + label); } sb.append("\n"); for (TreeNode child : children) { sb.append(indent + "if " + attribute + " " + child.relation + " " + child.value + ": "); child.createString(indent + " ", sb); } } @Override public String toString() { return "TreeNode [label=" + label + ", attribute=" + attribute + ", relation=" + relation + ", value=" + value + "]"; } } private static String createPropertyString(TreeNode node) { if ("<".equals(node.relation)) { return "lt_" + node.value; } if ("<=".equals(node.relation)) { return "lte_" + node.value; } if (">".equals(node.relation)) { return "gt_" + node.value; } if (">=".equals(node.relation)) { return "gte_" + node.value; } System.err.println("Unknown relation: " + node.relation); return "UNKNOWN"; } static Model createModel(TreeNode node) { Model model = ModelFactory.createDefaultModel(); String baseUri = "http://www.example.com/example#"; model.createResource(baseUri); model.setNsPrefix("base", baseUri); populateModel(model, baseUri, node, node.attribute); return model; } private static void populateModel(Model model, String baseUri, TreeNode node, String resourceName) { //System.out.println("Populate with " + resourceName); for (TreeNode child : node.children) { if (child.label != null) { Resource resource = model.createResource(baseUri + resourceName); String propertyString = createPropertyString(child); Property property = model.createProperty(baseUri, propertyString); Statement statement = model.createLiteralStatement(resource, property, child.label); model.add(statement); } else { Resource resource = model.createResource(baseUri + resourceName); String propertyString = createPropertyString(child); Property property = model.createProperty(baseUri, propertyString); String nextResourceName = resourceName + "_" + child.attribute; Resource childResource = model.createResource(baseUri + nextResourceName); Statement statement = model.createStatement(resource, property, childResource); model.add(statement); } } for (TreeNode child : node.children) { String nextResourceName = resourceName + "_" + child.attribute; populateModel(model, baseUri, child, nextResourceName); } } }
J48 pruned tree ------------------ petalwidth <= 0.6: Iris-setosa (50.0) petalwidth > 0.6 | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0) Number of Leaves : 5 Size of the tree : 9
Tree: if petalwidth <= 0.6: Iris-setosa if petalwidth > 0.6: if petalwidth <= 1.7: if petallength <= 4.9: Iris-versicolor if petallength > 4.9: if petalwidth <= 1.5: Iris-virginica if petalwidth > 1.5: Iris-versicolor if petalwidth > 1.7: Iris-virginica
Model: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:base="http://www.example.com/example#"> <rdf:Description rdf:about="http://www.example.com/example#petalwidth"> <base:gt_0.6> <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth"> <base:gt_1.7>Iris-virginica</base:gt_1.7> <base:lte_1.7> <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth_petallength"> <base:gt_4.9> <rdf:Description rdf:about="http://www.example.com/example#petalwidth_petalwidth_petallength_petalwidth"> <base:gt_1.5>Iris-versicolor</base:gt_1.5> <base:lte_1.5>Iris-virginica</base:lte_1.5> </rdf:Description> </base:gt_4.9> <base:lte_4.9>Iris-versicolor</base:lte_4.9> </rdf:Description> </base:lte_1.7> </rdf:Description> </base:gt_0.6> <base:lte_0.6>Iris-setosa</base:lte_0.6> </rdf:Description> </rdf:RDF>