Java 从XML文件解析HTML内容
这对我很管用Java 从XML文件解析HTML内容,java,xml,Java,Xml,这对我很管用 <xbrli:xbrl xmlns:aoi="http://www.aointl.com/20160331" xmlns:country="http://xbrl.sec.gov/country/2016-01-31" xmlns:currency="http://xbrl.sec.gov/currency/2016-01-31" xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31" xmlns:exch="http://xbr
<xbrli:xbrl xmlns:aoi="http://www.aointl.com/20160331" xmlns:country="http://xbrl.sec.gov/country/2016-01-31" xmlns:currency="http://xbrl.sec.gov/currency/2016-01-31" xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31" xmlns:exch="http://xbrl.sec.gov/exch/2016-01-31" xmlns:invest="http://xbrl.sec.gov/invest/2013-01-31" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:naics="http://xbrl.sec.gov/naics/2011-01-31" xmlns:nonnum="http://www.xbrl.org/dtr/type/non-numeric" xmlns:num="http://www.xbrl.org/dtr/type/numeric" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:sic="http://xbrl.sec.gov/sic/2011-01-31" xmlns:stpr="http://xbrl.sec.gov/stpr/2011-01-31" xmlns:us-gaap="http://fasb.org/us-gaap/2016-01-31" xmlns:us-roles="http://fasb.org/us-roles/2016-01-31" xmlns:us-types="http://fasb.org/us-types/2016-01-31" xmlns:utreg="http://www.xbrl.org/2009/utr" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:xbrldt="http://xbrl.org/2005/xbrldt" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<link:schemaRef xlink:href="aoi-20160331.xsd" xlink:type="simple"/>
<xbrli:context id="FD2016Q4YTD">
<xbrli:entity>
<xbrli:identifier scheme="http://www.sec.gov/CIK">0000939930</xbrli:identifier>
</xbrli:entity>
<xbrli:period>
<xbrli:startDate>2015-04-01</xbrli:startDate>
<xbrli:endDate>2016-03-31</xbrli:endDate>
</xbrli:period>
</xbrli:context>
<aoi:OtherIncomeAndExpensePolicyTextBlock contextRef="FD2016Q4YTD" id="Fact-F51C7616E17E5B8B0B770D410BBF5A3E">
<div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>
</aoi:OtherIncomeAndExpensePolicyTextBlock>
</xbrli:xbrl>
This is My XML[XBRL], i need to parse this. This xml is my input and i don't know whether its a valid or not but in need output like this :
<div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>
Please someone share me the knowledge for this problem i am facing from last two weeks.
this is the code i am using
File fXmlFile = new File("/home/devteam-user1/Desktop/ky/UnitTesting.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
XPath xPath = XPathFactory.newInstance().newXPath();
final String DIV_UNDER_ROOT = "/*/aoi";
NodeList divList = (NodeList)xPath.compile(DIV_UNDER_ROOT)
.evaluate(doc, XPathConstants.NODESET);
System.out.println(divList.getLength());
for (int i = 0; i < divList.getLength() ; i++) { // just in case there is more than one
Node divNode = divList.item(i);
System.out.println(nodeToString(divNode));
//nodeToString method below
private static String nodeToString(Node node) throws Exception
{
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
transformer.transform(new DOMSource(node), result);
return result.getWriter().toString();
}
你的主要问题在于
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream("yourfile.xml");
Document doc = Jsoup.parse(Utils.streamToString(fis));
System.out.println(doc.select("aoi|OtherIncomeAndExpensePolicyTextBlock").html().toString());
}
它是一个XPath表达式,匹配根目录下的任何节点2级别,该根目录的本地名称为aoi,没有名称空间。这不是你想要的
您希望匹配两个级别的节点的任何内容,该节点的名称空间使用aoi别名,这意味着它属于名称空间,并且其本地名称为OtherIncomeAndExpensePolicyTextBlock
在Java中的XPath中匹配名称空间非常麻烦,请参见,但长话短说,您可以尝试以下方法:
final String DIV_UNDER_ROOT = "/*/aoi";
这只有在DocumentBuilderFactory具有名称空间意识时才起作用,因此您应该通过如上所述进行配置来确保:
final String DIV_UNDER_ROOT = "//*[local-name()='OtherIncomeAndExpensePolicyTextBlock' and namespace-uri()='http://www.aointl.com/20160331']/*";
我不太明白,但是如果需要将HTML合并到XML中,应该转义字符。例如,Hello World将输出为“Hello World”或使用block@marco我不需要在xml中插入html。它已经存在于xml中了。我需要使用任何java api来获取html内容。在我的问题中,我清楚地提到了我的输入和输出使用XML解析器通过XML标记提取XML信息。保留HTML。但是您的XML文档作为一个整体是格式良好的吗?HTML部分没有缺少结束标记?这是同一个人提出的另一个问题的第2部分。我在那里给出了完整的答案,所以他把我的答案复制/粘贴到新问题中。在这个论坛上这是正确的行为吗@莎伦:我希望我们能从像你这样的知识巨人那里获得知识。你告诉我它的XML格式不好。n另外,我是这个论坛的新手,因为我知道如果我的问题是正确的,那么我将很容易得到解决方案。。。除此之外什么也不做。。Thanks@JohnAdam-不要编辑问题的答案!不要打开新问题复制粘贴上一个问题的答案!!这不是如何对待试图帮助你的人!!–Sharonb当你从别人那里复制粘贴时,这实际上只是一种基本的礼貌answer@wutzebaer,Utils?选择医生?你能解释一下吗?@wutzebaer,非常感谢你的代码。。它的working fine.OP应该花时间学习使用XPATH工具和语法。就这些
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);