Parsing html解析-替换换行符_Parsing_Html Parsing

Parsing html解析-替换换行符

parsing

Parsing html解析-替换换行符,parsing,html-parsing,Parsing,Html Parsing,我创建了一个简单的html解析代码，它从给定的Xpath获取文本内容我的代码： XPathFactory xFactory = XPathFactory.newInstance(); CleanerProperties props = new CleanerProperties(); props.setNamespacesAware(false); XPath xpathi = xFactory.newXPath(); HtmlCleaner cleaner = new HtmlCl

我创建了一个简单的html解析代码，它从给定的Xpath获取文本内容

我的代码：

XPathFactory xFactory = XPathFactory.newInstance();
CleanerProperties props  = new CleanerProperties();
props.setNamespacesAware(false);    
XPath xpathi = xFactory.newXPath();
HtmlCleaner cleaner = new HtmlCleaner(props);
TagNode node = cleaner.clean(rawContent);
org.w3c.dom.Document doc = new DomSerializer(props).createDOM(node);
Object[] obj = xpathi.compile("//div[@class='answer']").evaluate(doc, XPathConstants.NODESET);

在这篇文章中，我用期望的答案填充obj。但答案中的\n字符将替换为空字符串。如果答案是，一二三

我得到了1/23 我想要一二三

为此，我需要在CleanerProperty中设置任何属性吗

有什么建议吗？..

我想保留换行符..缩小问题范围。源html类似于一二三。由于换行符没有适当的关闭标记，它们似乎被html清理器删除了。因此，我得到的文本连接没有空间。有什么办法解决这个问题吗？