Java Web harvest无法将格式错误的html转换为xml

Java Web harvest无法将格式错误的html转换为xml,java,webharvest,Java,Webharvest,我在WebHarvest(来自java)中使用xquery处理器解析一个html页面,该页面包含元素中的无效标记,如。例外情况是: SXXP0003: Error reported by XML parser: Element type "div" must be followed by either attribute specifications, ">" or "/>". at org.webharvest.runtime.processors.XQueryProcessor

我在WebHarvest(来自java)中使用xquery处理器解析一个html页面,该页面包含
元素中的无效标记,如
。例外情况是:

SXXP0003: Error reported by XML parser: Element type "div" must be followed by either
attribute specifications, ">" or "/>".

at org.webharvest.runtime.processors.XQueryProcessor.execute(Unknown Source)
有没有快速的方法来清理div预处理?或解决此问题的任何方法?

尝试查看JTidy