Java 解析xml的更好方法
我已经像这样解析XML很多年了,我必须承认,当不同元素的数量变得越来越大时,我发现这样做有点无聊和累人,我的意思是,示例虚拟XML:Java 解析xml的更好方法,java,xml,sax,Java,Xml,Sax,我已经像这样解析XML很多年了,我必须承认,当不同元素的数量变得越来越大时,我发现这样做有点无聊和累人,我的意思是,示例虚拟XML: <?xml version="1.0"?> <Order> <Date>2003/07/04</Date> <CustomerId>123</CustomerId> <CustomerName>Acme Alpha</CustomerName>
<?xml version="1.0"?>
<Order>
<Date>2003/07/04</Date>
<CustomerId>123</CustomerId>
<CustomerName>Acme Alpha</CustomerName>
<Item>
<ItemId> 987</ItemId>
<ItemName>Coupler</ItemName>
<Quantity>5</Quantity>
</Item>
<Item>
<ItemId>654</ItemId>
<ItemName>Connector</ItemName>
<Quantity unit="12">3</Quantity>
</Item>
<Item>
<ItemId>579</ItemId>
<ItemName>Clasp</ItemName>
<Quantity>1</Quantity>
</Item>
</Order>
我想知道有没有办法摆脱这些随着元素数量不断增长的丑陋的布尔人。必须有更好的方法来解析这个相对简单的xml。仅仅通过查看执行此任务所需的代码行就显得很难看
目前我正在使用SAX解析器,但我愿意接受任何其他建议(除了DOM,我买不起内存中的解析器,因为我有巨大的XML文件)。在SAX中,解析器在处理程序中“推送”事件,因此您必须像在这里习惯的那样进行所有管理。另一种选择是StAX(即javax.xml.stream
包),它仍然是流式的,但您的代码负责从解析器中“提取”事件。这样,在程序的控制流中,哪些元素按什么顺序被期望的逻辑被编码,而不必显式地用布尔表示
根据XML的精确结构,可能会有一种“中间方法”,使用类似toolkit的工具,这种工具有一种操作模式,您可以将文档的子树解析为类似DOM的对象模型,处理该细枝,然后扔掉它并解析下一个。这适用于具有许多相似元素的重复文档,每个元素都可以单独处理-您可以轻松地在每个细枝中编程到基于树的API,但仍然具有流行为,使您能够高效地解析大型文档
public class ItemProcessor extends NodeFactory {
private Nodes emptyNodes = new Nodes();
public Nodes finishMakingElement(Element elt) {
if("Item".equals(elt.getLocalName())) {
// process the Item element here
System.out.println(elt.getFirstChildElement("ItemId").getValue()
+ ": " + elt.getFirstChildElement("ItemName").getValue());
// then throw it away
return emptyNodes;
} else {
return super.finishMakingElement(elt);
}
}
}
使用StAX和JAXB的组合可以实现类似的效果-定义表示重复元素(本例中的项)的JAXB注释类,然后创建一个StAX解析器,导航到第一个Item
start标记,然后您可以从XMLStreamReader
import java.io.File一次解组一个完整的项
;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class JXML {
private DocumentBuilder builder;
private Document doc = null;
private DocumentBuilderFactory factory ;
private XPathExpression expr = null;
private XPathFactory xFactory;
private XPath xpath;
private String xmlFile;
public static ArrayList<String> XMLVALUE ;
public JXML(String xmlFile){
this.xmlFile = xmlFile;
}
private void xmlFileSettings(){
try {
factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
xFactory = XPathFactory.newInstance();
xpath = xFactory.newXPath();
builder = factory.newDocumentBuilder();
doc = builder.parse(xmlFile);
}
catch (Exception e){
System.out.println(e);
}
}
public String[] selectQuery(String query){
xmlFileSettings();
ArrayList<String> records = new ArrayList<String>();
try {
expr = xpath.compile(query);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
records.add(nodes.item(i).getNodeValue());
}
return records.toArray(new String[records.size()]);
}
catch (Exception e) {
System.out.println("There is error in query string");
return records.toArray(new String[records.size()]);
}
}
public boolean updateQuery(String query,String value){
xmlFileSettings();
try{
NodeList nodes = (NodeList) xpath.evaluate(query, doc, XPathConstants.NODESET);
for (int idx = 0; idx < nodes.getLength(); idx++) {
nodes.item(idx).setTextContent(value);
}
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(doc), new StreamResult(new File(this.xmlFile)));
return true;
}catch(Exception e){
System.out.println(e);
return false;
}
}
public static void main(String args[]){
JXML jxml = new JXML("c://user.xml");
jxml.updateQuery("//Order/CustomerId/text()","222");
String result[]=jxml.selectQuery("//Order/Item/*/text()");
for(int i=0;i<result.length;i++){
System.out.println(result[i]);
}
}
导入java.io.FileOutputStream;
导入java.io.InputStream;
导入java.io.OutputStream;
导入java.util.ArrayList;
导入javax.xml.parsers.DocumentBuilder;
导入javax.xml.parsers.DocumentBuilderFactory;
导入javax.xml.transform.Transformer;
导入javax.xml.transform.TransformerFactory;
导入javax.xml.transform.dom.DOMSource;
导入javax.xml.transform.stream.StreamResult;
导入javax.xml.xpath.xpath;
导入javax.xml.xpath.XPathConstants;
导入javax.xml.xpath.XPathExpression;
导入javax.xml.xpath.XPathFactory;
导入org.w3c.dom.Document;
导入org.w3c.dom.NodeList;
公共类JXML{
私人文档生成器;
私人单据单据=null;
私人文件建设者工厂;
私有XPathExpression expr=null;
私有XPathFactory X工厂;
私有XPath;
私有字符串xml文件;
公共静态ArrayList XMLVALUE;
公共JXML(字符串xmlFile){
this.xmlFile=xmlFile;
}
私有void xmlFileSettings(){
试一试{
factory=DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
xFactory=XPathFactory.newInstance();
xpath=xFactory.newXPath();
builder=factory.newDocumentBuilder();
doc=builder.parse(xmlFile);
}
捕获(例外e){
系统输出打印ln(e);
}
}
公共字符串[]选择查询(字符串查询){
xmlFileSettings();
ArrayList记录=新的ArrayList();
试一试{
expr=xpath.compile(查询);
Object result=expr.evaluate(doc,XPathConstants.NODESET);
节点列表节点=(节点列表)结果;
对于(int i=0;i如果控制XML的定义,可以使用XML绑定工具,例如JAXB(XML绑定的Java体系结构)。在JAXB中,可以为XML结构定义模式(支持XSD和其他)或者注释Java类以定义序列化规则。一旦XML和Java之间有了明确的声明性映射,封送和解封XML就变得很简单了
使用JAXB确实需要比SAX处理程序更多的内存,但存在按部分处理XML文档的方法:
我一直在使用将我自己的对象序列化为xml,然后将它们作为Java对象加载回。如果您可以将所有符号表示为POJO,并且正确地注释POJO以匹配xml文件中的类型,您可能会发现使用起来更容易
当字符串表示XML中的对象时,您只需编写:
orderTheOrder=(Order)xstream.fromXML(xmlString);
我一直使用它在一行中将一个对象加载到内存中,但是如果您需要流式处理它,您应该能够使用a来迭代文档。这可能与@Dave建议的Simple非常相似。正如其他人所建议的,Stax模型将是一种更好的方法来最小化内存足迹sinc这是一个基于推送的模型。我个人使用了Axio(在Apache Axis中使用),并使用XPath表达式解析元素,这比您在提供的代码片段中所做的遍历节点元素要简单。下面是一个将JAXB与StAX结合使用的示例
输入文件:
<?xml version="1.0" encoding="UTF-8"?>
<Personlist xmlns="http://example.org">
<Person>
<Name>Name 1</Name>
<Address>
<StreetAddress>Somestreet</StreetAddress>
<PostalCode>00001</PostalCode>
<CountryName>Finland</CountryName>
</Address>
</Person>
<Person>
<Name>Name 2</Name>
<Address>
<StreetAddress>Someotherstreet</StreetAddress>
<PostalCode>43400</PostalCode>
<CountryName>Sweden</CountryName>
</Address>
</Person>
</Personlist>
Address.java:
public class Address {
@XmlElement(name = "StreetAddress", namespace = "http://example.org")
private String streetAddress;
@XmlElement(name = "PostalCode", namespace = "http://example.org")
private String postalCode;
@XmlElement(name = "CountryName", namespace = "http://example.org")
private String countryName;
public String getStreetAddress() {
return streetAddress;
}
public String getPostalCode() {
return postalCode;
}
public String getCountryName() {
return countryName;
}
}
PersonlistProcessor.java:
public class PersonlistProcessor {
public static void main(String[] args) throws Exception {
new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
.getResourceAsStream("personlist.xml"));
}
// TODO: Instead of throws Exception, all exceptions should be wrapped
// inside runtime exception
public void processPersonlist(InputStream inputStream) throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
// Create unmarshaller
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
// Go to next tag
xss.nextTag();
// Require Personlist
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
// Go to next tag
while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
// Require Person
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
// Unmarshall person
Person person = (Person)unmarshaller.unmarshal(xss);
// Process person
processPerson(person);
}
// Require Personlist
xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
}
private void processPerson(Person person) {
System.out.println(person.getName());
System.out.println(person.getAddress().getCountryName());
}
}
我一直在使用这个库。它位于标准Java库之上,对我来说更容易。特别是,您可以按名称要求特定的元素或属性,而不是使用您描述的大“if”语句
还有另一个库,它支持更紧凑的XML解析,RTXML。该库及其文档位于上。我在原始问题中实现了对文件的解析,我在这里包括完整的程序:
package for_so;
import java.io.File;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import rasmus_torkel.xml_basic.read.TagNode;
import rasmus_torkel.xml_basic.read.XmlReadOptions;
import rasmus_torkel.xml_basic.read.impl.XmlReader;
public class Q15626686_ReadOrder
{
public static class Order
{
public final Date _date;
public final int _customerId;
public final String _customerName;
public final ArrayList<Item> _itemAl;
public
Order(TagNode node)
{
_date = (Date)node.nextStringMappedFieldE("Date", Date.class);
_customerId = (int)node.nextIntFieldE("CustomerId");
_customerName = node.nextTextFieldE("CustomerName");
_itemAl = new ArrayList<Item>();
boolean finished = false;
while (!finished)
{
TagNode itemNode = node.nextChildN("Item");
if (itemNode != null)
{
Item item = new Item(itemNode);
_itemAl.add(item);
}
else
{
finished = true;
}
}
node.verifyNoMoreChildren();
}
}
public static final Pattern DATE_PATTERN = Pattern.compile("^(\\d\\d\\d\\d)\\/(\\d\\d)\\/(\\d\\d)$");
public static class Date
{
public final String _dateString;
public final int _year;
public final int _month;
public final int _day;
public
Date(String dateString)
{
_dateString = dateString;
Matcher matcher = DATE_PATTERN.matcher(dateString);
if (!matcher.matches())
{
throw new RuntimeException(dateString + " does not match pattern " + DATE_PATTERN.pattern());
}
_year = Integer.parseInt(matcher.group(1));
_month = Integer.parseInt(matcher.group(2));
_day = Integer.parseInt(matcher.group(3));
}
}
public static class Item
{
public final int _itemId;
public final String _itemName;
public final Quantity _quantity;
public
Item(TagNode node)
{
_itemId = node.nextIntFieldE("ItemId");
_itemName = node.nextTextFieldE("ItemName");
_quantity = new Quantity(node.nextChildE("Quantity"));
node.verifyNoMoreChildren();
}
}
public static class Quantity
{
public final int _unitSize;
public final int _unitQuantity;
public
Quantity(TagNode node)
{
_unitSize = node.attributeIntD("unit", 1);
_unitQuantity = node.onlyInt();
}
}
public static void
main(String[] args)
{
File xmlFile = new File(args[0]);
TagNode orderNode = XmlReader.xmlFileToRoot(xmlFile, "Order", XmlReadOptions.DEFAULT);
Order order = new Order(orderNode);
System.out.println("Read order for " + order._customerName + " which has " + order._itemAl.size() + " items");
}
}
_so的包;
导入java.io.File;
导入java.util.ArrayList;
导入java.util.regex.Matcher;
导入java.util.regex.Pattern;
导入rasmus_torkel.xml_basic.read.TagNode;
导入rasmus_torkel.xml_basic.read.XmlReadOp
public class PersonlistProcessor {
public static void main(String[] args) throws Exception {
new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
.getResourceAsStream("personlist.xml"));
}
// TODO: Instead of throws Exception, all exceptions should be wrapped
// inside runtime exception
public void processPersonlist(InputStream inputStream) throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
// Create unmarshaller
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
// Go to next tag
xss.nextTag();
// Require Personlist
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
// Go to next tag
while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
// Require Person
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
// Unmarshall person
Person person = (Person)unmarshaller.unmarshal(xss);
// Process person
processPerson(person);
}
// Require Personlist
xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
}
private void processPerson(Person person) {
System.out.println(person.getName());
System.out.println(person.getAddress().getCountryName());
}
}
package for_so;
import java.io.File;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import rasmus_torkel.xml_basic.read.TagNode;
import rasmus_torkel.xml_basic.read.XmlReadOptions;
import rasmus_torkel.xml_basic.read.impl.XmlReader;
public class Q15626686_ReadOrder
{
public static class Order
{
public final Date _date;
public final int _customerId;
public final String _customerName;
public final ArrayList<Item> _itemAl;
public
Order(TagNode node)
{
_date = (Date)node.nextStringMappedFieldE("Date", Date.class);
_customerId = (int)node.nextIntFieldE("CustomerId");
_customerName = node.nextTextFieldE("CustomerName");
_itemAl = new ArrayList<Item>();
boolean finished = false;
while (!finished)
{
TagNode itemNode = node.nextChildN("Item");
if (itemNode != null)
{
Item item = new Item(itemNode);
_itemAl.add(item);
}
else
{
finished = true;
}
}
node.verifyNoMoreChildren();
}
}
public static final Pattern DATE_PATTERN = Pattern.compile("^(\\d\\d\\d\\d)\\/(\\d\\d)\\/(\\d\\d)$");
public static class Date
{
public final String _dateString;
public final int _year;
public final int _month;
public final int _day;
public
Date(String dateString)
{
_dateString = dateString;
Matcher matcher = DATE_PATTERN.matcher(dateString);
if (!matcher.matches())
{
throw new RuntimeException(dateString + " does not match pattern " + DATE_PATTERN.pattern());
}
_year = Integer.parseInt(matcher.group(1));
_month = Integer.parseInt(matcher.group(2));
_day = Integer.parseInt(matcher.group(3));
}
}
public static class Item
{
public final int _itemId;
public final String _itemName;
public final Quantity _quantity;
public
Item(TagNode node)
{
_itemId = node.nextIntFieldE("ItemId");
_itemName = node.nextTextFieldE("ItemName");
_quantity = new Quantity(node.nextChildE("Quantity"));
node.verifyNoMoreChildren();
}
}
public static class Quantity
{
public final int _unitSize;
public final int _unitQuantity;
public
Quantity(TagNode node)
{
_unitSize = node.attributeIntD("unit", 1);
_unitQuantity = node.onlyInt();
}
}
public static void
main(String[] args)
{
File xmlFile = new File(args[0]);
TagNode orderNode = XmlReader.xmlFileToRoot(xmlFile, "Order", XmlReadOptions.DEFAULT);
Order order = new Order(orderNode);
System.out.println("Read order for " + order._customerName + " which has " + order._itemAl.size() + " items");
}
}
private PARSE_MODE parseMode = PARSE_MODE.__UNDEFINED__;
// NB: essential that all these enum values are upper case, but this is the convention anyway
private enum PARSE_MODE {
__UNDEFINED__, ORDER, DATE, CUSTOMERID, ITEM };
private List<String> parseModeStrings = new ArrayList<String>();
private Stack<PARSE_MODE> modeBreadcrumbs = new Stack<PARSE_MODE>();
for( PARSE_MODE pm : PARSE_MODE.values() ){
// might want to check here that these are indeed upper case
parseModeStrings.add( pm.name() );
}
@Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
String localNameUC = localName.toUpperCase();
// pushing "__UNDEFINED__" would mess things up! But unlikely name for an XML element
assert ! localNameUC.equals( "__UNDEFINED__" );
if( parseModeStrings.contains( localNameUC )){
parseMode = PARSE_MODE.valueOf( localNameUC );
// any "policing" to do with which modes are allowed to switch into
// other modes could be put here...
// in your case, go `new Order()` here when parseMode == ORDER
modeBreadcrumbs.push( parseMode );
}
else {
// typically ignore the start of this element...
}
}
@Override
private void endElement(String uri, String localName, String qName) throws Exception {
String localNameUC = localName.toUpperCase();
if( parseModeStrings.contains( localNameUC )){
// will not fail unless XML structure which is malformed in some way
// or coding error in use of the Stack, etc.:
assert modeBreadcrumbs.pop() == parseMode;
if( modeBreadcrumbs.empty() ){
parseMode = PARSE_MODE.__UNDEFINED__;
}
else {
parseMode = modeBreadcrumbs.peek();
}
}
else {
// typically ignore the end of this element...
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
switch( parseMode ){
case DATE:
// PS - this SimpleDateFormat object can be a field: it doesn't need to be created hundreds of times
SimpleDateFormat formatter. ...
String value = ...
...
break;
case CUSTOMERID:
order.setCustomerId( ...
break;
case ITEM:
item = new Item();
// this next line probably won't be needed: when you get to endElement, if
// parseMode is ITEM, the previous mode will be restored automatically
// isItem = false ;
}
}
public abstract class AbstractSAXHandler extends DefaultHandler {
protected enum PARSE_MODE implements SAXHandlerParseMode {
__UNDEFINED__
};
// abstract: the concrete subclasses must populate...
abstract protected Collection<Enum<?>> getPossibleModes();
//
private Stack<SAXHandlerParseMode> modeBreadcrumbs = new Stack<SAXHandlerParseMode>();
private Collection<Enum<?>> possibleModes;
private Map<String, Enum<?>> nameToEnumMap;
private Map<String, Enum<?>> getNameToEnumMap(){
// lazy creation and population of map
if( nameToEnumMap == null ){
if( possibleModes == null ){
possibleModes = getPossibleModes();
}
nameToEnumMap = new HashMap<String, Enum<?>>();
for( Enum<?> possibleMode : possibleModes ){
nameToEnumMap.put( possibleMode.name(), possibleMode );
}
}
return nameToEnumMap;
}
protected boolean isLegitimateModeName( String name ){
return getNameToEnumMap().containsKey( name );
}
protected SAXHandlerParseMode getParseMode() {
return modeBreadcrumbs.isEmpty()? PARSE_MODE.__UNDEFINED__ : modeBreadcrumbs.peek();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
try {
_startElement(uri, localName, qName, attributes);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses (NB I think caught Exceptions are not a brilliant design choice in Java)
protected void _startElement(String uri, String localName, String qName, Attributes attributes)
throws Exception {
String qNameUC = qName.toUpperCase();
// very undesirable ever to push "UNDEFINED"! But unlikely name for an XML element
assert !qNameUC.equals("__UNDEFINED__") : "Encountered XML element with qName \"__UNDEFINED__\"!";
if( getNameToEnumMap().containsKey( qNameUC )){
Enum<?> newMode = getNameToEnumMap().get( qNameUC );
modeBreadcrumbs.push( (SAXHandlerParseMode)newMode );
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
try {
_endElement(uri, localName, qName);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses
protected void _endElement(String uri, String localName, String qName) throws Exception {
String qNameUC = qName.toUpperCase();
if( getNameToEnumMap().containsKey( qNameUC )){
modeBreadcrumbs.pop();
}
}
public List<?> showModeBreadcrumbs(){
return org.apache.commons.collections4.ListUtils.unmodifiableList( modeBreadcrumbs );
}
}
interface SAXHandlerParseMode {
}
private enum PARSE_MODE implements SAXHandlerParseMode {
ORDER, DATE, CUSTOMERID, ITEM
};
private Collection<Enum<?>> possibleModes;
@Override
protected Collection<Enum<?>> getPossibleModes() {
// lazy initiation
if (possibleModes == null) {
List<SAXHandlerParseMode> parseModes = new ArrayList<SAXHandlerParseMode>( Arrays.asList(PARSE_MODE.values()) );
possibleModes = new ArrayList<Enum<?>>();
for( SAXHandlerParseMode parseMode : parseModes ){
possibleModes.add( PARSE_MODE.valueOf( parseMode.toString() ));
}
// __UNDEFINED__ mode (from abstract superclass) must be added afterwards
possibleModes.add( AbstractSAXHandler.PARSE_MODE.__UNDEFINED__ );
}
return possibleModes;
}