Text 在字符集之间转换文本文件的最佳方法？_Text_Unicode_Utf 8_Character Set

Text 在字符集之间转换文本文件的最佳方法？

text unicode utf-8

Text 在字符集之间转换文本文件的最佳方法？,text,unicode,utf-8,character-set,Text,Unicode,Utf 8,Character Set,在字符集之间转换文本文件的最快、最简单的工具或方法是什么具体来说，我需要将UTF-8转换为ISO-8859-15，反之亦然一切都在进行：用您最喜欢的脚本语言编写的一行程序、命令行工具或用于操作系统、网站等的其他实用程序迄今为止的最佳解决方案：在Linux/UNIX/OS X/cygwin上：建议的Gnu最好用作过滤器。这似乎是普遍可用的。例如： $ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt $ recode UTF8..IS

在字符集之间转换文本文件的最快、最简单的工具或方法是什么

具体来说，我需要将UTF-8转换为ISO-8859-15，反之亦然

一切都在进行：用您最喜欢的脚本语言编写的一行程序、命令行工具或用于操作系统、网站等的其他实用程序

迄今为止的最佳解决方案：在Linux/UNIX/OS X/cygwin上：

建议的Gnu最好用作过滤器。这似乎是普遍可用的。例如：
```
$ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt
```
```
$ recode UTF8..ISO-8859-15 in.txt
```
正如作者所指出的，有一个问题
建议的Gnu（）将转换一个或多个文件。例如：
```
$ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt
```
```
$ recode UTF8..ISO-8859-15 in.txt
```
这一个使用较短的别名：
```
$ recode utf8..l9 in.txt
```
Recode还支持可用于在不同行尾类型和编码之间转换的曲面：
将换行符从LF（Unix）转换为CR-LF（DOS）：
Base64编码文件：
```
$ recode ../Base64 in.txt
```
你也可以把它们结合起来
将具有Unix行结尾的Base64编码UTF8文件转换为具有Dos行结尾的Base64编码拉丁1文件：
```
$ recode utf8/Base64..l1/CR-LF/Base64 file.txt
```

在具有（）的Windows上：

```
PS C:\>gc-en utf8 in.txt | Out File-en ascii Out.txt
```
（但不支持ISO-8859-15；它说支持的字符集是unicode、utf7、utf8、utf32、ascii、双端unicode、default和oem。）

编辑你是说iso-8859-1支持吗？使用“字符串”可以做到这一点，例如，反之亦然

gc -en string in.txt | Out-File -en utf8 out.txt

注意：可能的枚举值为“未知、字符串、Unicode、字节、双字节Unicode、UTF8、UTF7、Ascii”

CsCvt-是另一个伟大的基于命令行的Windows转换工具

还有许多语言中基于iconv的工具。

在Linux下，您可以使用非常强大的recode命令尝试在不同的字符集之间进行转换以及任何行尾问题recode-l将向您显示该工具可以转换的所有格式和编码。这可能是一个很长的列表。

方法

iconv-f ISO-8859-1-t UTF-8 in.txt>out.txt

-f编码输入的编码
-t编码输出的编码

您不必指定这两个参数中的任何一个。它们将默认为您当前的语言环境，通常为UTF-8

Get-Content -Encoding UTF8 FILE-UTF8.TXT | Out-File -Encoding UTF7 FILE-UTF7.TXT

如果可以假定输入BOM表是正确的，则为最短版本：

gc FILE.TXT | Out-File -en utf7 file-utf7.txt

PHP-iconv（）

iconv（“UTF-8”、“ISO-8859-15”、$input）尝试iconv Bash函数
我已将其放入.bashrc
：
utf8()
{
    iconv -f ISO-8859-1 -t UTF-8 $1 > $1.tmp
    rm $1
    mv $1.tmp $1
}

…为了能够像这样转换文件：
utf8 MyClass.java

试试记事本++
在Windows上，我能够使用Notepad++完成从ISO-8859-1到UTF-8的转换。单击“Encoding”
，然后单击“Convert to UTF-8”
，如所述，可以在OS X上轻松地在支持的所有编码之间进行转换
此外，您还可以显示从所有编码转换为Unicode的文件的某些字节，以快速查看哪个字节适合您的文件。试试VIM
如果您有vim
，您可以使用：
没有针对每种编码进行测试
最酷的是，你不必知道源代码
vim +"set nobomb | set fenc=utf8 | x" filename.txt

请注意，此命令直接修改文件

解释部分！
+
：vim在打开文件时直接输入命令。通常用于在特定行打开文件：vim+14 file.txt
|
：多个命令的分隔符（如bash中的；
）
设置nobomb
：无utf-8 BOM
设置fenc=utf8
：将新编码设置为utf-8
x
：保存并关闭文件
filename.txt
：文件路径
“
：Qote在这里是因为管道（否则bash将使用它们作为bash管道）
使用查找的Oneliner，具有自动字符集检测功能
自动检测所有匹配文本文件的字符编码，并将所有匹配文本文件转换为utf-8
编码：
$ find . -type f -iname *.txt -exec sh -c 'iconv -f $(file -bi "$1" |sed -e "s/.*[ ]charset=//") -t utf-8 -o converted "$1" && mv converted "$1"' -- {} \;

为了执行这些步骤，子shellsh
与-exec
一起使用，运行带有-c
标志的一行程序，并将文件名作为位置参数“$1”
与-{}
一起传递。在这两者之间，utf-8
输出文件暂时命名为转换的
这意味着：

-b
，-简介
不要将文件名前置到输出行（简短模式）

-i
，-mime
导致file命令输出mime类型的字符串，而不是更传统的人类可读字符串。因此，它可能会说例如text/plain；charset=us ascii
，而不是ascii text
。sed
命令按照iconv
的要求，将其仅转换为us ascii



find命令对于此类文件管理自动化非常有用。
单击此处查看。
要编写属性文件（Java），我通常在linux（mint和ubuntu发行版）中使用此文件：
例如：
$ cat test.properties 
first=Execução número um
second=Execução número dois

$ native2ascii test.properties 
first=Execu\u00e7\u00e3o n\u00famero um
second=Execu\u00e7\u00e3o n\u00famero dois

PS：我用葡萄牙语写了执行编号1/2，以强制特殊字符
在我的情况下，在第一次执行中，我收到了以下消息：
$ native2ascii teste.txt 
The program 'native2ascii' can be found in the following packages:
 * gcj-5-jdk
 * openjdk-8-jdk-headless
 * gcj-4.8-jdk
 * gcj-4.9-jdk
Try: sudo apt install <selected package>

$native2ascii teste.txt
程序“native2ascii”可在以下软件包中找到：
*gcj-5-jdk
*openjdk-8-jdk-headless
*gcj-4.8-jdk
*gcj-4.9-jdk
尝试：sudoapt安装

当我安装第一个选项（gcj-5-jdk）时，问题就解决了
我希望这对某人有所帮助。
DOS/Windows:使用
命令chcp
可用于更改代码页。代码页65001是UTF-8的Microsoft名称。设置代码页后，以下命令生成的输出将设置为代码页。
使用ruby：
ruby -e "File.write('output.txt', File.read('input.txt').encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ''))"

来源：使用此Python脚本：
工作
chcp 65001>NUL
type ascii.txt > unicode.txt

ruby -e "File.write('output.txt', File.read('input.txt').encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ''))"

function Recode($InCharset, $InFile, $OutCharset, $OutFile)  {
    # Read input file in the source encoding
    $Encoding = [System.Text.Encoding]::GetEncoding($InCharset)
    $Text = [System.IO.File]::ReadAllText($InFile, $Encoding)
    
    # Write output file in the destination encoding
    $Encoding = [System.Text.Encoding]::GetEncoding($OutCharset)    
    [System.IO.File]::WriteAllText($OutFile, $Text, $Encoding)
}

Recode Windows-1252 "$pwd\in.txt" utf8 "$pwd\out.txt" 

iconv -f $(chardetect input.text | awk '{print $2}') -t utf-8 -o output.text