Html parsing JSoup.clean()未保留相对URL
我试过:Html parsing JSoup.clean()未保留相对URL,html-parsing,jsoup,Html Parsing,Jsoup,我试过: Whitelist.relaxed(); Whitelist.relaxed().preserveRelativeLinks(true); Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp"); Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp").preserve
Whitelist.relaxed();
Whitelist.relaxed().preserveRelativeLinks(true);
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp");
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp").preserveRelativeLinks(true);
它们都不起作用:当我尝试清理一个相对url时,比如
,我会删除href
属性(测试
)
我正在使用JSOUP1.8.2
有什么想法吗?问题很可能源于对clean方法的调用。如果您提供基本URI,则所有URI都应按预期工作:
String html = ""
+ "<a href=\"/test.xhtml\">test</a>"
+ "<invalid>stuff</invalid>"
+ "<h2>header1</h2>";
String cleaned = Jsoup.clean(html, "http://base.uri", Whitelist.relaxed().preserveRelativeLinks(true));
System.out.println(cleaned);
String html=“”
+ ""
+“东西”
+“校长1”;
String cleaned=Jsoup.clean(html,“http://base.uri,Whitelist.relaxed().preserveRelativeLink(true));
系统输出打印项次(清洁);
上面的工作并保持了相关链接。使用String cleaned=Jsoup.clean(html,Whitelist.relaxed().preserveRelativeLink(true))
但是该链接被删除
注意:
请注意,在处理相对链接时,输入文档必须具有
解析时设置适当的基URI,以便链接的协议
可以证实。无论“保持相对”的设置如何
链接选项,则该链接必须可根据基本URI解析为
允许的协议;否则,该属性将被删除