在核心Java中-从多个文件夹中读取多个XML文件并附加到一个XML文件中这是我的主要驱动路径：C:\JavaPractice\Task3\Process\test\ 在上面的主目录中，我有多个子文件夹，每个子文件夹包含一个“tud.xml” 需要对每个tud.xml进行爬网，并从该xml文件中提取“”标记如果标签包含多个度数（例如：MSC、PHD），则将每个度数拆分为单独的行将其附加到一个名为deg.xml的输出文件中，并进行唯一和排序。（请注意，输出文件不包含重复的单词）_Java_Xml_Saxon

在核心Java中-从多个文件夹中读取多个XML文件并附加到一个XML文件中这是我的主要驱动路径：C:\JavaPractice\Task3\Process\test\ 在上面的主目录中，我有多个子文件夹，每个子文件夹包含一个“tud.xml” 需要对每个tud.xml进行爬网，并从该xml文件中提取“”标记如果标签包含多个度数（例如：MSC、PHD），则将每个度数拆分为单独的行将其附加到一个名为deg.xml的输出文件中，并进行唯一和排序。（请注意，输出文件不包含重复的单词）

java xml

在核心Java中-从多个文件夹中读取多个XML文件并附加到一个XML文件中这是我的主要驱动路径：C:\JavaPractice\Task3\Process\test\ 在上面的主目录中，我有多个子文件夹，每个子文件夹包含一个“tud.xml” 需要对每个tud.xml进行爬网，并从该xml文件中提取“”标记如果标签包含多个度数（例如：MSC、PHD），则将每个度数拆分为单独的行将其附加到一个名为deg.xml的输出文件中，并进行唯一和排序。（请注意，输出文件不包含重复的单词）,java,xml,saxon,Java,Xml,Saxon,我的代码： import net.sf.saxon.Configuration; import net.sf.saxon.lib.NamespaceConstant; import net.sf.saxon.om.NodeInfo; import net.sf.saxon.om.TreeInfo; import net.sf.saxon.xpath.XPathFactoryImpl; import org.xml.sax.InputSource;

我的代码：

    import net.sf.saxon.Configuration;
    import net.sf.saxon.lib.NamespaceConstant;
    import net.sf.saxon.om.NodeInfo;
    import net.sf.saxon.om.TreeInfo;
    import net.sf.saxon.xpath.XPathFactoryImpl;
    import org.xml.sax.InputSource;
    import javax.xml.transform.sax.SAXSource;
    import javax.xml.xpath.*;
    import java.io.BufferedWriter;
    import java.io.File;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.Arrays;
    import java.util.Collections;
    import java.util.HashMap;
    import java.util.LinkedHashMap;
    import java.util.List;
    import java.util.Map;
    import java.util.Scanner;
    import java.util.TreeMap;


    public class Task3 {

        private static String[] ParaToSentenc(String PtS) {
            String[] strArray = PtS.split(",");
            return strArray;
        }


        private static List<String> UniqueAndSortWord(String[] UW) {
            List<String> unique_sort = new ArrayList<String>();
            Map<String, String> hMap = new HashMap<String, String>();
            for(String word : UW) {
                if(!hMap.containsKey(word)) { 
                    hMap.put(word,"");
                    unique_sort.add(word);
                }        
            }
            Collections.sort(unique_sort);
            return unique_sort;
        }

        private static void FileWriter(String content, String outputfile) {
            File file = new File(outputfile);
            FileWriter writer = null;
            BufferedWriter bw = null;
            try {
                writer = new FileWriter(file);
                bw = new BufferedWriter(writer);
                bw.write(content);
                bw.flush();
                bw.close();
            }
            catch (IOException e) {
                System.out.println("Error");;
            }
        }       

        public static void main (String args[]) throws Exception {
            String Inputname = args[0];//sc.nextLine(); //"D:\\document.xml";
            String outputname = args[1];//sc.nextLine(); //"D:\\document.txt";
            Task3.runApp(Inputname, outputname);
            System.out.println("Success");
        }

        /**
         * Run the application
         */


        private static void runApp(String filename, String outputfile) throws Exception {


            /////////////////////////////////////////////
            // The following initialization code is specific to Saxon
            // Please refer to SaxonHE documentation for details
            System.setProperty("javax.xml.xpath.XPathFactory:"+
                               NamespaceConstant.OBJECT_MODEL_SAXON,
                               "net.sf.saxon.xpath.XPathFactoryImpl");

            XPathFactory xpFactory = XPathFactory.
                                     newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
            XPath xpExpression = xpFactory.newXPath();
            System.err.println("Loaded XPath Provider " + xpExpression.getClass().getName());

            // Build the source document.
            InputSource inputSrc = new InputSource(new File(filename).toURL().toString());
            SAXSource saxSrc = new SAXSource(inputSrc);
            Configuration config = ((XPathFactoryImpl) xpFactory).getConfiguration();
            TreeInfo treeInfo = config.buildDocumentTree(saxSrc);
            // End Saxon specific code
            /////////////////////////////////////////////


            XPathExpression findwtTags =
                                        xpExpression.compile("count(//deg)");


            Number countResults = (Number)findwtTags.evaluate(treeInfo, XPathConstants.NUMBER);


            // Get a list of the <deg> Tags
            // The following expression gets a set of nodes that have a <deg> Tags,
            // then extracts the text node from the <deg> tags
            XPathExpression findwtTextNodes =
                                             xpExpression.compile("//deg");



            //global string

            String global = "";

            List resultNodeList = (List) findwtTextNodes.evaluate(treeInfo, XPathConstants.NODESET);
            if (resultNodeList != null) {
                int count = resultNodeList.size();

                for (int i = 0; i < count; i++) {
                    NodeInfo cNode = (NodeInfo) resultNodeList.get(i);
                    String name = cNode.getStringValue();
                    global = global + "\n" + name;
                }
            }


            //Full content text...
            String globalText = "Full Degree content:" + global + "\n\n";


            // Para To Sentence...
            String[] strSenArray = ParaToSentenc(global);
            globalText = globalText + "Each Degree separated in line by line:\n";
    //        globalText = globalText + "Sentence Count : "+strSenArray.length+"\n";
            for(int i=0; i<strSenArray.length; i++){
                globalText = globalText + strSenArray[i].trim() + "\n";
            }
            globalText = globalText + "\n";



            //Unique Words
            List<String> strUniqueList = UniqueAndSortWord(strSenArray);
            globalText = globalText + "Unique Degree list:\n";
            for(String word : strUniqueList){
                globalText = globalText + word.trim() + "\n";
            }
            globalText = globalText.substring(0, globalText.length()-1);
            globalText = globalText + "\n\n";

            //All Text wtite into file...
            FileWriter(globalText, outputfile);
        }



    }

import net.sf.saxon.Configuration；
导入net.sf.saxon.lib.NamespaceConstant；
导入net.sf.saxon.om.NodeInfo；
导入net.sf.saxon.om.TreeInfo；
导入net.sf.saxon.xpath.XPathFactoryImpl；
导入org.xml.sax.InputSource；
导入javax.xml.transform.sax.SAXSource；
导入javax.xml.xpath.*；
导入java.io.BufferedWriter；
导入java.io.File；
导入java.io.FileWriter；
导入java.io.IOException；
导入java.util.ArrayList；
导入java.util.array；
导入java.util.Collections；
导入java.util.HashMap；
导入java.util.LinkedHashMap；
导入java.util.List；
导入java.util.Map；
导入java.util.Scanner；
导入java.util.TreeMap；
公开课任务3{
专用静态字符串[]副主题（字符串PtS）{
字符串[]strArray=PtS.split（“，”）；
回程线；
}
私有静态列表UniqueAndSortWord（字符串[]UW）{
List unique_sort=new ArrayList（）；
Map hMap=newhashmap（）；
for（字符串字：UW）{
如果（！hMap.containsKey（word））{
hMap.put（字“”）；
唯一排序。添加（word）；
}        
}
Collections.sort（唯一排序）；
返回唯一的_排序；
}
私有静态void FileWriter（字符串内容、字符串输出文件）{
文件文件=新文件（输出文件）；
FileWriter=null；
BufferedWriter bw=null；
试一试{
writer=新文件编写器（文件）；
bw=新的缓冲写入程序（写入程序）；
写（内容）；
bw.flush（）；
bw.close（）；
}
捕获（IOE异常）{
System.out.println（“错误”）；；
}
}       
公共静态void main（字符串args[]）引发异常{
字符串Inputname=args[0]；//sc.nextLine（）；//“D:\\document.xml”；
字符串outputname=args[1]；//sc.nextLine（）；//“D:\\document.txt”；
Task3.runApp（Inputname，outputname）；
System.out.println（“成功”）；
}
/**
*运行应用程序
*/
私有静态void runApp（字符串文件名、字符串输出文件）引发异常{
/////////////////////////////////////////////
//以下初始化代码特定于Saxon
//有关详细信息，请参阅SaxonHE文档
System.setProperty（“javax.xml.xpath.XPathFactory：”+
NamespaceConstant.OBJECT\u MODEL\u SAXON，
“net.sf.saxon.xpath.XPathFactoryImpl”）；
XPathFactory=XPathFactory。
newInstance（NamespaceConstant.OBJECT\u MODEL\u SAXON）；
XPath xpExpression=xpFactory.newXPath（）；
System.err.println（“加载的XPath提供程序”+xpExpression.getClass（）.getName（））；
//构建源文档。
InputSource inputSrc=新的InputSource（新文件（文件名）.toURL（）.toString（））；
SAXSource saxSrc=新SAXSource（inputSrc）；
配置配置=（（XPathFactoryImpl）xpFactory）.getConfiguration（）；
TreeInfo TreeInfo=config.buildDocumentTree（saxSrc）；
//结束特定于萨克森的代码
/////////////////////////////////////////////
XPathExpression findwtTags=
compile（“count（//deg）”；
Number countResults=（Number）findwtTags.evaluate（treeInfo，XPathConstants.Number）；
//获取标签列表
//下面的表达式获取一组具有标记的节点，
//然后从标记中提取文本节点
XPathExpression findwtTextNodes=
xpExpression.compile（“//deg”）；
//全局字符串
字符串global=“”；
List resultNodeList=（List）findwtTextNodes.evaluate（treeInfo，XPathConstants.NODESET）；
if（resultNodeList！=null）{
int count=resultNodeList.size（）；
for（int i=0；i对于（int i=0；i您可以使用XPath 3.1在一个XPath表达式中完成这一切：
(collection('file:///C:/JavaPractice/Task3/Process/test?select=tud.xml;recurse=yes') //deg 
! tokenize(., ',')) => distinct-values() => sort())))

Java需要做的就是运行这个表达式并处理结果字符串序列。
哪一部分[s]您是否遇到问题？我不知道如何逐个读取多个xml文件。此外，我的排序功能不起作用。如果您能够将问题和示例代码与您遇到的单个问题隔离开来，而不是一次列出所有代码，这将是有帮助的。例如，如果您的xpath有效