Java 如何使用utf8字符正确读取url内容？_Java_Url_Encode_Utf

Java 如何使用utf8字符正确读取url内容？

java url

Java 如何使用utf8字符正确读取url内容？,java,url,encode,utf,Java,Url,Encode,Utf,以及：当我运行这个时，我得到：{“句子”[{“trans”：“end”，“orig”：“koďż˝”；“translit”：“srcżtranslit”：“}]，“src”：“pl”，“server_time”：30} 所以utf不能正常工作，但如果我返回编码的url:http://translate.google.com/translate_a/t?client=o&text=ko%C5%84&hl=en&sl=pl&tl=en并粘贴在url栏上，我得到正确的信息：{“句子”：[{“翻译”：

以及：

当我运行这个时，我得到：

{“句子”[{“trans”：“end”，“orig”：“koďż˝”；“translit”：“srcżtranslit”：“}]，“src”：“pl”，“server_time”：30}

所以utf不能正常工作，但如果我返回编码的url:

http://translate.google.com/translate_a/t?client=o&text=ko%C5%84&hl=en&sl=pl&tl=en

并粘贴在url栏上，我得到正确的信息：

{“句子”：[{“翻译”：“马”，“原语”：“kon”，“translit”：“，”src_translit”：“}]，”dict:[{“pos”：“名词”，“术语”：[“马”]}]，“src”：“pl”，“服务器时间”：76}

public class AbcServlet extends HttpServlet {
 public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
  resp.setContentType("text/plain;charset=UTF-8");
  resp.getWriter().println(new String(URLReader.read("pl", "en", "koń")));
 }
}

为您提供UTF-8字节序列，因此URLReader.read也为您提供UTF-8字节序列

但是您尝试使用而不指定编码器进行解码，即

新字符串（URLReader.read（“pl”、“en”、“kon”））

因此Java将使用您的系统默认编码进行解码（这不是UTF-8）

尝试：

更新

以下是我的机器上的完整工作代码：

new String(URLReader.read("pl", "en", "koń"), "UTF-8")

别忘了逃到\u0144。Java编译器可能无法正确编译Unicode文本，因此最好使用纯ASCII编写

public class URLReader {

    public static byte[] read(String from, String to, String string) {
        try {
            String text = "http://translate.google.com/translate_a/t?"
                    + "client=o&text=" + URLEncoder.encode(string, "UTF-8")
                    + "&hl=en&sl=" + from + "&tl=" + to + "";
            URL url = new URL(text);
            URLConnection conn = url.openConnection();
            // Look like faking the request coming from Web browser solve 403 error
            conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)");
            BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
            String json = in.readLine();
            byte[] bytes = json.getBytes("UTF-8");
            in.close();
            return bytes;
            //return text.getBytes();
        } catch (Exception e) {
            System.out.println(e);
            // becarful with returning null. subsequence call will return NullPointException.
            return null;
        }
    }
}

hmm现在返回{“句子”：[{“trans”：“end”，“orig”：“ko”�","翻译“：”，“src_translit:”}]，“src:“pl”，“server_time:”20}这是从您的web浏览器得到的吗？处理编码字节时不要使用PrinWriter。PrintWriter将使用不是UTF-8的JVM默认编码器。尝试getOutputStream.write（（新字符串（URLReader.read（“pl”、“en”、“kon”）、“UTF-8”））。getBytes（“UTF-8”））注意设置resp.setContentType（“text/plain；charset=UTF-8”）；不会真正告诉servlet使用UTF-8对其进行编码。只需通知目标web浏览器/客户端，您将发送一个用UTF-8编码的字节流。实际内容编码不需要与内容类型标头匹配。（当然你不想这样）我不需要写这个，我需要正确地将数据保存到数据库，但我看不到一个好方法来确定我尝试了你的代码，但我从谷歌服务器得到了403个错误。它不允许我使用它的翻译。

new String(URLReader.read("pl", "en", "koń"), "UTF-8")

public class URLReader {

    public static byte[] read(String from, String to, String string) {
        try {
            String text = "http://translate.google.com/translate_a/t?"
                    + "client=o&text=" + URLEncoder.encode(string, "UTF-8")
                    + "&hl=en&sl=" + from + "&tl=" + to + "";
            URL url = new URL(text);
            URLConnection conn = url.openConnection();
            // Look like faking the request coming from Web browser solve 403 error
            conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)");
            BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
            String json = in.readLine();
            byte[] bytes = json.getBytes("UTF-8");
            in.close();
            return bytes;
            //return text.getBytes();
        } catch (Exception e) {
            System.out.println(e);
            // becarful with returning null. subsequence call will return NullPointException.
            return null;
        }
    }
}

public class AbcServlet extends HttpServlet {

    @Override
    public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
        resp.setContentType("text/plain;charset=UTF-8");
        byte[] read = URLReader.read("pl", "en", "ko\u0144");
        resp.getOutputStream().write(read) ;
    }
}