在核心Java中-从多个文件夹中读取多个XML文件并附加到一个XML文件中 这是我的主要驱动路径:C:\JavaPractice\Task3\Process\test\ 在上面的主目录中,我有多个子文件夹,每个子文件夹包含一个“tud.xml” 需要对每个tud.xml进行爬网,并从该xml文件中提取“”标记 如果标签包含多个度数(例如:MSC、PHD),则将每个度数拆分为单独的行 将其附加到一个名为deg.xml的输出文件中,并进行唯一和排序。(请注意,输出文件不包含重复的单词)
我的代码:在核心Java中-从多个文件夹中读取多个XML文件并附加到一个XML文件中 这是我的主要驱动路径:C:\JavaPractice\Task3\Process\test\ 在上面的主目录中,我有多个子文件夹,每个子文件夹包含一个“tud.xml” 需要对每个tud.xml进行爬网,并从该xml文件中提取“”标记 如果标签包含多个度数(例如:MSC、PHD),则将每个度数拆分为单独的行 将其附加到一个名为deg.xml的输出文件中,并进行唯一和排序。(请注意,输出文件不包含重复的单词),java,xml,saxon,Java,Xml,Saxon,我的代码: import net.sf.saxon.Configuration; import net.sf.saxon.lib.NamespaceConstant; import net.sf.saxon.om.NodeInfo; import net.sf.saxon.om.TreeInfo; import net.sf.saxon.xpath.XPathFactoryImpl; import org.xml.sax.InputSource;
import net.sf.saxon.Configuration;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.om.NodeInfo;
import net.sf.saxon.om.TreeInfo;
import net.sf.saxon.xpath.XPathFactoryImpl;
import org.xml.sax.InputSource;
import javax.xml.transform.sax.SAXSource;
import javax.xml.xpath.*;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
public class Task3 {
private static String[] ParaToSentenc(String PtS) {
String[] strArray = PtS.split(",");
return strArray;
}
private static List<String> UniqueAndSortWord(String[] UW) {
List<String> unique_sort = new ArrayList<String>();
Map<String, String> hMap = new HashMap<String, String>();
for(String word : UW) {
if(!hMap.containsKey(word)) {
hMap.put(word,"");
unique_sort.add(word);
}
}
Collections.sort(unique_sort);
return unique_sort;
}
private static void FileWriter(String content, String outputfile) {
File file = new File(outputfile);
FileWriter writer = null;
BufferedWriter bw = null;
try {
writer = new FileWriter(file);
bw = new BufferedWriter(writer);
bw.write(content);
bw.flush();
bw.close();
}
catch (IOException e) {
System.out.println("Error");;
}
}
public static void main (String args[]) throws Exception {
String Inputname = args[0];//sc.nextLine(); //"D:\\document.xml";
String outputname = args[1];//sc.nextLine(); //"D:\\document.txt";
Task3.runApp(Inputname, outputname);
System.out.println("Success");
}
/**
* Run the application
*/
private static void runApp(String filename, String outputfile) throws Exception {
/////////////////////////////////////////////
// The following initialization code is specific to Saxon
// Please refer to SaxonHE documentation for details
System.setProperty("javax.xml.xpath.XPathFactory:"+
NamespaceConstant.OBJECT_MODEL_SAXON,
"net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory xpFactory = XPathFactory.
newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
XPath xpExpression = xpFactory.newXPath();
System.err.println("Loaded XPath Provider " + xpExpression.getClass().getName());
// Build the source document.
InputSource inputSrc = new InputSource(new File(filename).toURL().toString());
SAXSource saxSrc = new SAXSource(inputSrc);
Configuration config = ((XPathFactoryImpl) xpFactory).getConfiguration();
TreeInfo treeInfo = config.buildDocumentTree(saxSrc);
// End Saxon specific code
/////////////////////////////////////////////
XPathExpression findwtTags =
xpExpression.compile("count(//deg)");
Number countResults = (Number)findwtTags.evaluate(treeInfo, XPathConstants.NUMBER);
// Get a list of the <deg> Tags
// The following expression gets a set of nodes that have a <deg> Tags,
// then extracts the text node from the <deg> tags
XPathExpression findwtTextNodes =
xpExpression.compile("//deg");
//global string
String global = "";
List resultNodeList = (List) findwtTextNodes.evaluate(treeInfo, XPathConstants.NODESET);
if (resultNodeList != null) {
int count = resultNodeList.size();
for (int i = 0; i < count; i++) {
NodeInfo cNode = (NodeInfo) resultNodeList.get(i);
String name = cNode.getStringValue();
global = global + "\n" + name;
}
}
//Full content text...
String globalText = "Full Degree content:" + global + "\n\n";
// Para To Sentence...
String[] strSenArray = ParaToSentenc(global);
globalText = globalText + "Each Degree separated in line by line:\n";
// globalText = globalText + "Sentence Count : "+strSenArray.length+"\n";
for(int i=0; i<strSenArray.length; i++){
globalText = globalText + strSenArray[i].trim() + "\n";
}
globalText = globalText + "\n";
//Unique Words
List<String> strUniqueList = UniqueAndSortWord(strSenArray);
globalText = globalText + "Unique Degree list:\n";
for(String word : strUniqueList){
globalText = globalText + word.trim() + "\n";
}
globalText = globalText.substring(0, globalText.length()-1);
globalText = globalText + "\n\n";
//All Text wtite into file...
FileWriter(globalText, outputfile);
}
}
import net.sf.saxon.Configuration;
导入net.sf.saxon.lib.NamespaceConstant;
导入net.sf.saxon.om.NodeInfo;
导入net.sf.saxon.om.TreeInfo;
导入net.sf.saxon.xpath.XPathFactoryImpl;
导入org.xml.sax.InputSource;
导入javax.xml.transform.sax.SAXSource;
导入javax.xml.xpath.*;
导入java.io.BufferedWriter;
导入java.io.File;
导入java.io.FileWriter;
导入java.io.IOException;
导入java.util.ArrayList;
导入java.util.array;
导入java.util.Collections;
导入java.util.HashMap;
导入java.util.LinkedHashMap;
导入java.util.List;
导入java.util.Map;
导入java.util.Scanner;
导入java.util.TreeMap;
公开课任务3{
专用静态字符串[]副主题(字符串PtS){
字符串[]strArray=PtS.split(“,”);
回程线;
}
私有静态列表UniqueAndSortWord(字符串[]UW){
List unique_sort=new ArrayList();
Map hMap=newhashmap();
for(字符串字:UW){
如果(!hMap.containsKey(word)){
hMap.put(字“”);
唯一排序。添加(word);
}
}
Collections.sort(唯一排序);
返回唯一的_排序;
}
私有静态void FileWriter(字符串内容、字符串输出文件){
文件文件=新文件(输出文件);
FileWriter=null;
BufferedWriter bw=null;
试一试{
writer=新文件编写器(文件);
bw=新的缓冲写入程序(写入程序);
写(内容);
bw.flush();
bw.close();
}
捕获(IOE异常){
System.out.println(“错误”);;
}
}
公共静态void main(字符串args[])引发异常{
字符串Inputname=args[0];//sc.nextLine();//“D:\\document.xml”;
字符串outputname=args[1];//sc.nextLine();//“D:\\document.txt”;
Task3.runApp(Inputname,outputname);
System.out.println(“成功”);
}
/**
*运行应用程序
*/
私有静态void runApp(字符串文件名、字符串输出文件)引发异常{
/////////////////////////////////////////////
//以下初始化代码特定于Saxon
//有关详细信息,请参阅SaxonHE文档
System.setProperty(“javax.xml.xpath.XPathFactory:”+
NamespaceConstant.OBJECT\u MODEL\u SAXON,
“net.sf.saxon.xpath.XPathFactoryImpl”);
XPathFactory=XPathFactory。
newInstance(NamespaceConstant.OBJECT\u MODEL\u SAXON);
XPath xpExpression=xpFactory.newXPath();
System.err.println(“加载的XPath提供程序”+xpExpression.getClass().getName());
//构建源文档。
InputSource inputSrc=新的InputSource(新文件(文件名).toURL().toString());
SAXSource saxSrc=新SAXSource(inputSrc);
配置配置=((XPathFactoryImpl)xpFactory).getConfiguration();
TreeInfo TreeInfo=config.buildDocumentTree(saxSrc);
//结束特定于萨克森的代码
/////////////////////////////////////////////
XPathExpression findwtTags=
compile(“count(//deg)”;
Number countResults=(Number)findwtTags.evaluate(treeInfo,XPathConstants.Number);
//获取标签列表
//下面的表达式获取一组具有标记的节点,
//然后从标记中提取文本节点
XPathExpression findwtTextNodes=
xpExpression.compile(“//deg”);
//全局字符串
字符串global=“”;
List resultNodeList=(List)findwtTextNodes.evaluate(treeInfo,XPathConstants.NODESET);
if(resultNodeList!=null){
int count=resultNodeList.size();
for(int i=0;i 对于(int i=0;i您可以使用XPath 3.1在一个XPath表达式中完成这一切:
(collection('file:///C:/JavaPractice/Task3/Process/test?select=tud.xml;recurse=yes') //deg
! tokenize(., ',')) => distinct-values() => sort())))
Java需要做的就是运行这个表达式并处理结果字符串序列。哪一部分[s]您是否遇到问题?我不知道如何逐个读取多个xml文件。此外,我的排序功能不起作用。如果您能够将问题和示例代码与您遇到的单个问题隔离开来,而不是一次列出所有代码,这将是有帮助的。例如,如果您的xpath有效