Java jsoup异常无效url

Java jsoup异常无效url,java,jsoup,Java,Jsoup,最近,我尝试使用jsoup解析web页面。我有一段连接到url的代码: page.url = "https://admin.xosn.com/pdf9/3876515.pdf?DB_OEM_ID=31000"; Connection conn = Jsoup.connect(page.url); Document htmlDocument = conn.get(); this.htmlDocument = htmlDocument; if(!conn.response().contentType

最近,我尝试使用jsoup解析web页面。我有一段连接到url的代码:

page.url = "https://admin.xosn.com/pdf9/3876515.pdf?DB_OEM_ID=31000";
Connection conn = Jsoup.connect(page.url);
Document htmlDocument = conn.get();
this.htmlDocument = htmlDocument;
if(!conn.response().contentType().contains("text/html")) {
    System.out.println("**Failure**\nRetrieved something other than  HTML");
    return false;
}
我得到了一个错误:

Exception in thread "main" java.lang.IllegalArgumentException: Must supply a valid URL
at org.jsoup.helper.Validate.notEmpty(Validate.java:102)
at org.jsoup.helper.HttpConnection.url(HttpConnection.java:74)
at org.jsoup.helper.HttpConnection.connect(HttpConnection.java:38)
at org.jsoup.Jsoup.connect(Jsoup.java:73) 
它似乎在浏览器中工作。我不知道为什么它不适用于jsoup

是HTML解析器,它无法解析Pdf,您可以在jsoup中使用HttpUrlConnection解析url之前验证url

String url4e = "https://admin.xosn.com/pdf9/3876515.pdf?DB_OEM_ID=31000";
URL url1 = new URL(url4e);   
HttpURLConnection conn = (HttpURLConnection) url1.openConnection();
conn.setRequestMethod("GET");
conn.connect();
System.out.println(conn.getContentType());

Validate.java检查的第102行是什么?或者HttpConnection.java的第74行?你的url对我来说很好。请回答您的问题,并提供简短但完整的示例,让我们重现您的问题。是的,我发现了。我可以让它与“ignorecontenttype”一起工作。谢谢