Java Jsoup中umlaute的奇怪行为编码_Java_Url_Jsoup

Java Jsoup中umlaute的奇怪行为编码

java url

Java Jsoup中umlaute的奇怪行为编码,java,url,jsoup,Java,Url,Jsoup,我对JSoup库的编码行为有一些问题我想解析一个网页的内容，因此我必须插入一些人的名字，也可能包含德语的umlaute asä，ö等这是我正在使用的代码： doc = Jsoup.parse(new URL(searchURL).openStream(), "UTF-8", searchURL); 解析resp的html。网页但当我查看该文件时，结果如下所示： KÃse 我的编码有什么错该网页的标题如下： <!doctype html> <html> &

我对JSoup库的编码行为有一些问题

我想解析一个网页的内容，因此我必须插入一些人的名字，也可能包含德语的umlaute asä，ö等

这是我正在使用的代码：

doc = Jsoup.parse(new URL(searchURL).openStream(), "UTF-8", searchURL);

解析resp的html。网页

但当我查看该文件时，结果如下所示：

KÃse

我的编码有什么错

该网页的标题如下：

<!doctype html>
<html>
    <head lang="en"> 
    <title>KÃ¤se - Semantic Scholar</title> 
    <meta charset="utf-8"> 
</html>


KÃse-语义学者

有人帮忙吗？谢谢：）

编辑：我尝试了Stephans answer，它对网页www.semanticscholar.org有效，但我也在解析另一个网页，

如果作者的姓名包含德语umlaut，则相同的代码不适用于此网页。

有人知道为什么这不起作用吗？不知道这一点非常尴尬……

这是Jsoup的一个已知问题。以下是加载Jsoup内容的两个选项：

选项1:仅限JDK

选项2:带

最后的想法：

- Never rely on website encoding if you didn't check manually (when possible) the real encoding in use.
- Never rely on Jsoup to find somehow the right encoding.
- You can [automate encoding guessing][2]. See the previous link for details.

通过使用Jsoup.parse（）方法为行设置断点并观察帧。然后，头部包含这个古玩符号而不是ä；se-Semantic Scholar在使用Jsoup.parse（html）应用以下行后，它的值与描述中的值相同。KÃse-Semantic ScholarTo澄清：我正在IntelliJ中运行此代码，但在那里它不起作用。但是当打包到Jar文件并在windows命令行中运行时，它正在工作…@tschens在IntelliJ中运行时检查IntellJ启动的JVM的编码。我爱你，感谢你的回答：D我将编码更改为windows-1252，但你能解释一下，为什么utf-8不可能做到这一点吗？如果我在浏览器中键入相同的单词，url正在工作，页面将显示…@tschens IntellJ对其启动的JVM使用的编码是什么？

InputStream is = null;

try {
    // Connect to website
    URL tmp = new URL(url);
    HttpURLConnection connection = (HttpURLConnection) tmp.openConnection();
    connection.setReadTimeout(10000);
    connection.setConnectTimeout(10000);
    connection.setRequestMethod("GET");
    connection.connect();

    // Load content for Jsoup
    is = connection.getInputStream(); // We suppose connection.getResponseCode() == 200
    String html = IOUtils.toString(is, "UTF-8")

    // Parse html
    Document doc = Jsoup.parse(html, searchURL);
} catch(IOException e) {
    // Handle exception ...
} finally {
    IOUtils.closeQuietly(is);
}

- Never rely on website encoding if you didn't check manually (when possible) the real encoding in use.
- Never rely on Jsoup to find somehow the right encoding.
- You can [automate encoding guessing][2]. See the previous link for details.