如何用java.XML.xpath解析提供的XML?
我正在尝试解析此XML:如何用java.XML.xpath解析提供的XML?,java,xml,xpath,Java,Xml,Xpath,我正在尝试解析此XML: <?xml version="1.0" encoding="UTF-8"?> <veranstaltungen> <veranstaltung id="201611211500#25045271"> <titel>Mal- und Zeichen-Treff</titel> <start>2016-11-21 15:00:00</start> <vera
<?xml version="1.0" encoding="UTF-8"?>
<veranstaltungen>
<veranstaltung id="201611211500#25045271">
<titel>Mal- und Zeichen-Treff</titel>
<start>2016-11-21 15:00:00</start>
<veranstaltungsort id="20011507">
<name>Freizeitclub - ganz unbehindert </name>
<anschrift>Macht los e.V.
Lipezker Straße 48
03048 Cottbus
</anschrift>
<telefon>xxxx xxxx </telefon>
<fax>0355 xxxx</fax>
[...]
</veranstaltungen>
而不是:
Macht los e.V. Lipezker Straße 48 03048 Cottbus
我知道正确的解析方法应该是使用normalie-space()。我试过这个:
// Does not work; afaik because xpath 1 normalizes just the first node
xPath.compile("normalize-space(veranstaltungen/veranstaltung[position()=1]/veranstaltungsort/anschrift/text()"));
// Does not work
xPath.compile("veranstaltungen/veranstaltung[position()=1]/veranstaltungsort[normalize-space(anschrift/text())]"));
我还尝试了这里给出的解决方案:
我做错了什么
更新
我已经有了解决办法,但这不是解决办法。以下几行显示了我如何将HTTPResponse中的字符串组合在一起:
try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) {
final StringBuilder stringBuilder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
// stringBuilder.append(line);
// WORKAROUND: Add a space after each line
stringBuilder.append(line).append(" ");
}
// Work with the red lines
}
我希望有一个可靠的解决方案。最初,您似乎使用以下代码来读取XML:
try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) {
final StringBuilder stringBuilder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
}
}
这就是换行符被吃掉的地方:readline()
不返回尾随的换行符。如果随后解析stringBuilder
对象的内容,将得到一个不正确的DOM,其中文本节点不包含XML中的原始换行符。多亏了Markus的帮助,我才能够解决这个问题。原因是BufferedReader的readLine()方法丢弃换行符。以下代码片段对我很有用(也许可以改进):
normalize-space()。由于您的结果在anschrift
元素的文本内容行之间没有空格,因此在normalize-space()
开始工作之前,必须有东西吃掉您的换行符。我不知道这一点。谢谢你提供的信息。然后,我的解决方案是检查该行是否以“>”结尾,如果没有添加“
;”,请不要这样做。您正在再次修改输入。你为什么要进行在线阅读?为什么不按原样解析输入流呢?我应该清醒一会儿。你是对的。我现在就做。
Macht los e.V.Lipezker Straße 4803048 Cottbus
try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) {
final StringBuilder stringBuilder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
// stringBuilder.append(line);
// WORKAROUND: Add a space after each line
stringBuilder.append(line).append(" ");
}
// Work with the red lines
}
try (BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), Charset.forName(charset)))) {
final StringBuilder stringBuilder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
}
}
public Document getDocument() throws IOException, ParserConfigurationException, SAXException {
final HttpResponse response = getResponse(); // returns a HttpResonse
final HttpEntity entity = response.getEntity();
final Charset charset = ContentType.getOrDefault(entity).getCharset();
// Not 100% sure if I have to close the InputStreamReader. But I guess so.
try (InputStreamReader isr = new InputStreamReader(entity.getContent(), charset == null ? Charset.forName("UTF-8") : charset)) {
return documentBuilderFactory.newDocumentBuilder().parse(new InputSource(isr));
}
}