Utf 8 HtmlUnit:中文网站的编码
我认为这是非常基本的: 从中文网站下载页面时,所有中文字符在保存的文件(viw java NIO Files.write)中显示为“?” 我知道中文网页被检索为UTF-8(page.getPageEncoding()返回“UTF-8”),但我在保存网页时出现了一些问题 我的代码如下:Utf 8 HtmlUnit:中文网站的编码,utf-8,htmlunit,chinese-locale,Utf 8,Htmlunit,Chinese Locale,我认为这是非常基本的: 从中文网站下载页面时,所有中文字符在保存的文件(viw java NIO Files.write)中显示为“?” 我知道中文网页被检索为UTF-8(page.getPageEncoding()返回“UTF-8”),但我在保存网页时出现了一些问题 我的代码如下: final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45); webClient.getOptions().setThr
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setTimeout(15000);
final HtmlPage page = webClient.getPage(urlNow);
pageAsXml = page.asXml();
NioLog.getLogger().debug(page.getPageEncoding());
Files.write(Paths.get(outputPath + File.separator + fileNameTruncated + TXT), pageAsXml.getBytes());
barrayXml = page.asXml().getBytes(Charset.forName("UTF-8"));
Files.write(Paths.get(outputPath + File.separator + fileNameTruncated + TXT), barrayXml );
答案如下:
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setTimeout(15000);
final HtmlPage page = webClient.getPage(urlNow);
pageAsXml = page.asXml();
NioLog.getLogger().debug(page.getPageEncoding());
Files.write(Paths.get(outputPath + File.separator + fileNameTruncated + TXT), pageAsXml.getBytes());
barrayXml = page.asXml().getBytes(Charset.forName("UTF-8"));
Files.write(Paths.get(outputPath + File.separator + fileNameTruncated + TXT), barrayXml );