Java 将Solr xml文件解析为SolrInputDocument_Java_Solr_Xml Parsing

Java 将Solr xml文件解析为SolrInputDocument

java solr

Java 将Solr xml文件解析为SolrInputDocument,java,solr,xml-parsing,Java,Solr,Xml Parsing,如果我有预期Solr格式的单个文件（每个文件只有一个文档）： GB18030测试使用一些GB18030编码字符进行测试这里没有口音 ÕâÊÇÒ»¸ö¹¦ÄÜ 0 是否有一种方法可以轻松地将该文件封送到SolrInputDocument中？我必须自己解析吗编辑：我在java pojo中需要它，因为我想在使用SolrJ对其编制索引之前修改一些字段…编辑：为了将XML转换为pojo，请参考前面的问题- 因为您已经有了预期格式的文档，所以可以只使用post.jar或post.sh脚本文件，如中

如果我有预期Solr格式的单个文件（每个文件只有一个文档）：


GB18030测试
使用一些GB18030编码字符进行测试
这里没有口音
ÕâÊÇÒ»¸ö¹¦ÄÜ
0

是否有一种方法可以轻松地将该文件封送到SolrInputDocument中？我必须自己解析吗

编辑：我在java pojo中需要它，因为我想在使用SolrJ对其编制索引之前修改一些字段…

编辑：为了将XML转换为pojo，请参考前面的问题-

因为您已经有了预期格式的文档，所以可以只使用post.jar或post.sh脚本文件，如中所示，这两个文件都接受xml文件作为输入

另外，SolrJ ClientUtils库中有一个方法可能对您有用。当然，为了使用

toSolrInputDocument（）

方法，您需要将文件封送到SolrDocument类中。

这最好通过编程来完成。我知道您正在寻找Java解决方案，但我个人推荐groovy

以下脚本处理在当前目录中找到的XML文件

//
// Dependencies
// ============
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.SolrInputDocument

@Grapes([
    @Grab(group='org.apache.solr', module='solr-solrj', version='3.5.0'),
])

//
// Main
// =====
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");

new File(".").eachFileMatch(~/.*\.xml/) { 

    it.withReader { reader ->
        def xml = new XmlSlurper().parse(reader)

        xml.doc.each { 
            SolrInputDocument doc = new SolrInputDocument();

            it.field.each {
                doc.addField(it.@name.text(), it.text())
            }

            server.add(doc)
        }
    }

}

server.commit()

在Java中，您可以这样做

private void populateIndexFromXmlFile(String fileName) throws Exception {

    UpdateRequest update = new UpdateRequest();

    update.add(getSolrInputDocumentListFromXmlFile(fileName));

    update.process(server);

    server.commit();
}

private List<SolrInputDocument> getSolrInputDocumentListFromXmlFile(
        String fileName) throws Exception {

    ArrayList<SolrInputDocument> solrDocList = new ArrayList<SolrInputDocument>();

    File fXmlFile = new File(fileName);

    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(fXmlFile);

    NodeList docList = doc.getElementsByTagName("doc");

    for (int docIdx = 0; docIdx < docList.getLength(); docIdx++) {

        Node docNode = docList.item(docIdx);

        if (docNode.getNodeType() == Node.ELEMENT_NODE) {

            SolrInputDocument solrInputDoc = new SolrInputDocument();

            Element docElement = (Element) docNode;

            NodeList fieldsList = docElement.getChildNodes();

            for (int fieldIdx = 0; fieldIdx < fieldsList.getLength(); fieldIdx++) {

                Node fieldNode = fieldsList.item(fieldIdx);

                if (fieldNode.getNodeType() == Node.ELEMENT_NODE) {

                    Element fieldElement = (Element) fieldNode;

                    String fieldName = fieldElement.getAttribute("name");
                    String fieldValue = fieldElement.getTextContent();

                    solrInputDoc.addField(fieldName, fieldValue);
                }

            }

            solrDocList.add(solrInputDoc);
        }
    }

    return solrDocList;

}

private void populateIndexFromXmlFile（字符串文件名）引发异常{
UpdateRequest update=新的UpdateRequest（）；
add（getSolrInputDocumentListFromXmlFile（文件名））；
更新进程（服务器）；
commit（）；
}
私有列表getSolrInputDocumentListFromXmlFile(
字符串文件名）引发异常{
ArrayList solrDocList=新的ArrayList（）；
File fXmlFile=新文件（文件名）；
DocumentBuilderFactory dbFactory=DocumentBuilderFactory.newInstance（）；
DocumentBuilder dBuilder=dbFactory.newDocumentBuilder（）；
documentdoc=dBuilder.parse（fXmlFile）；
NodeList docList=doc.getElementsByTagName（“doc”）；
对于（int docIdx=0；docIdx

谁编写单个xml文件？你呢？其他人？太糟糕了！我想知道的是，您可以直接使用JavaBean，并将每一个添加为文本文档。有趣的

private void populateIndexFromXmlFile(String fileName) throws Exception {

    UpdateRequest update = new UpdateRequest();

    update.add(getSolrInputDocumentListFromXmlFile(fileName));

    update.process(server);

    server.commit();
}

private List<SolrInputDocument> getSolrInputDocumentListFromXmlFile(
        String fileName) throws Exception {

    ArrayList<SolrInputDocument> solrDocList = new ArrayList<SolrInputDocument>();

    File fXmlFile = new File(fileName);

    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(fXmlFile);

    NodeList docList = doc.getElementsByTagName("doc");

    for (int docIdx = 0; docIdx < docList.getLength(); docIdx++) {

        Node docNode = docList.item(docIdx);

        if (docNode.getNodeType() == Node.ELEMENT_NODE) {

            SolrInputDocument solrInputDoc = new SolrInputDocument();

            Element docElement = (Element) docNode;

            NodeList fieldsList = docElement.getChildNodes();

            for (int fieldIdx = 0; fieldIdx < fieldsList.getLength(); fieldIdx++) {

                Node fieldNode = fieldsList.item(fieldIdx);

                if (fieldNode.getNodeType() == Node.ELEMENT_NODE) {

                    Element fieldElement = (Element) fieldNode;

                    String fieldName = fieldElement.getAttribute("name");
                    String fieldValue = fieldElement.getTextContent();

                    solrInputDoc.addField(fieldName, fieldValue);
                }

            }

            solrDocList.add(solrInputDoc);
        }
    }

    return solrDocList;

}