Php 如何以UTF-8格式写入文件?
我有很多不是UTF-8编码的文件,我正在将一个站点转换为UTF-8编码 我正在使用简单脚本保存我想以utf-8格式保存的文件,但这些文件是以旧编码保存的:Php 如何以UTF-8格式写入文件?,php,encoding,utf-8,iconv,mbstring,Php,Encoding,Utf 8,Iconv,Mbstring,我有很多不是UTF-8编码的文件,我正在将一个站点转换为UTF-8编码 我正在使用简单脚本保存我想以utf-8格式保存的文件,但这些文件是以旧编码保存的: header('Content-type: text/html; charset=utf-8'); mb_internal_encoding('UTF-8'); $fpath="folder"; $d=dir($fpath); while (False !== ($a = $d->read())) { if ($a != '.'
header('Content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');
$fpath="folder";
$d=dir($fpath);
while (False !== ($a = $d->read()))
{
if ($a != '.' and $a != '..')
{
$npath=$fpath.'/'.$a;
$data=file_get_contents($npath);
file_put_contents('tempfolder/'.$a, $data);
}
}
如何以utf-8编码保存文件?文件获取内容/文件放置内容不会神奇地转换编码 您必须显式地转换字符串;例如使用或 试试这个:
$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/'.$a, $data);
或者,使用PHP的流过滤器:
$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));
营救。在Unix/Linux上,也可以使用一个简单的shell命令来转换给定目录中的所有文件:
recode L1..UTF8 dir/*
也可以通过PHPs exec()启动。添加BOM:UTF-8
file_put_contents($myFile, "\xEF\xBB\xBF". $content);
如果要递归使用recode并对类型进行筛选,请尝试以下操作:
find . -name "*.html" -exec recode L1..UTF8 {} \;
这对我有用。:)
我从你那儿收到这封信
我将所有这些放在一起,得到了将ANSI文本文件转换为“UTF-8无标记”的简单方法:
用法:filesToUTF8('C:/Temp/','C:/Temp/conv_files/','php,txt') 这是一个非常有用的问题。我认为我在Windows10PHP7上的解决方案对于那些还存在UTF-8转换问题的人来说非常有用 这是我的步骤。调用以下函数(此处名为utfsave.PHP的PHP脚本本身必须具有UTF-8编码,这可以通过UltraEdit上的转换轻松完成 在utfsave.php中,我们定义了一个调用php fopen($filename,“wb”)的函数,也就是说,它是在w写入模式下打开的,尤其是在b写入模式下
<?php
//
// UTF-8 编码:
//
// fnc001: save string as a file in UTF-8:
// The resulting file is UTF-8 only if $strContent is,
// with French accents, chinese ideograms, etc..
//
function entSaveAsUtf8($strContent, $filename) {
$fp = fopen($filename, "wb");
fwrite($fp, $strContent);
fclose($fp);
return True;
}
//
// 0. write UTF-8 string in fly into UTF-8 file:
//
$strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";
$filename = "utf8text.txt";
entSaveAsUtf8($strContent, $filename);
//
// 2. convert CP936 ANSI/OEM - chinese simplified GBK file into UTF-8 file:
//
$strContent = file_get_contents("cp936gbktext.txt");
$strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");
$filename = "utf8text2.txt";
entSaveAsUtf8($strContent, $filename);
?>
在Windows 10 php上运行utf8save.php,从而创建utf8text.txt,utf8text2.txt文件将自动以UTF-8格式保存
使用此方法,不需要BOM字符。BOM解决方案很糟糕,因为当我们为MySQL寻找sql文件时,它会带来麻烦
值得注意的是,我没能制作出文件内容($filename,utf8_encode($mystring))为此目的
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
如果您不知道源文件的编码,可以使用PHP列出编码:
print_r(mb_list_encodings());
这给出了如下列表:
Array
(
[0] => pass
[1] => wchar
[2] => byte2be
[3] => byte2le
[4] => byte4be
[5] => byte4le
[6] => BASE64
[7] => UUENCODE
[8] => HTML-ENTITIES
[9] => Quoted-Printable
[10] => 7bit
[11] => 8bit
[12] => UCS-4
[13] => UCS-4BE
[14] => UCS-4LE
[15] => UCS-2
[16] => UCS-2BE
[17] => UCS-2LE
[18] => UTF-32
[19] => UTF-32BE
[20] => UTF-32LE
[21] => UTF-16
[22] => UTF-16BE
[23] => UTF-16LE
[24] => UTF-8
[25] => UTF-7
[26] => UTF7-IMAP
[27] => ASCII
[28] => EUC-JP
[29] => SJIS
[30] => eucJP-win
[31] => EUC-JP-2004
[32] => SJIS-win
[33] => SJIS-Mobile#DOCOMO
[34] => SJIS-Mobile#KDDI
[35] => SJIS-Mobile#SOFTBANK
[36] => SJIS-mac
[37] => SJIS-2004
[38] => UTF-8-Mobile#DOCOMO
[39] => UTF-8-Mobile#KDDI-A
[40] => UTF-8-Mobile#KDDI-B
[41] => UTF-8-Mobile#SOFTBANK
[42] => CP932
[43] => CP51932
[44] => JIS
[45] => ISO-2022-JP
[46] => ISO-2022-JP-MS
[47] => GB18030
[48] => Windows-1252
[49] => Windows-1254
[50] => ISO-8859-1
[51] => ISO-8859-2
[52] => ISO-8859-3
[53] => ISO-8859-4
[54] => ISO-8859-5
[55] => ISO-8859-6
[56] => ISO-8859-7
[57] => ISO-8859-8
[58] => ISO-8859-9
[59] => ISO-8859-10
[60] => ISO-8859-13
[61] => ISO-8859-14
[62] => ISO-8859-15
[63] => ISO-8859-16
[64] => EUC-CN
[65] => CP936
[66] => HZ
[67] => EUC-TW
[68] => BIG-5
[69] => CP950
[70] => EUC-KR
[71] => UHC
[72] => ISO-2022-KR
[73] => Windows-1251
[74] => CP866
[75] => KOI8-R
[76] => KOI8-U
[77] => ArmSCII-8
[78] => CP850
[79] => JIS-ms
[80] => ISO-2022-JP-2004
[81] => ISO-2022-JP-MOBILE#KDDI
[82] => CP50220
[83] => CP50220raw
[84] => CP50221
[85] => CP50222
)
如果猜不到,请逐个尝试,因为mb_detect_encoding()无法轻松完成此任务。不知道此命令。谢谢我使用Linux作为工作站,我所有的本地服务器都在Linux上。L1.是什么意思。。命令中的意思是?@Starmaster:L1是源字符集拉丁文-1的简写。我试图创建一个php下载脚本,以便将UTF-8用于丹麦字符,这是它所缺少的,tyIt也适用于UTF-16,但使用该字节:fwrite($f,pack(“CC”,0xff,0xfe));这对我来说是有效的,下载用utf编码的aspx页面到windows平台上。这应该是公认的答案。。。又短又甜,很管用!创建一个被识别为UTF-8的文件和转换该文件的内容是有区别的。没有特殊字符的纯文本文件具有与没有BOM的UTF-8相同的内容,并且可能正在处理文本的解析器具有编码选项。PHP本身使用UTF-8,所以如果您看到文本OK但文件似乎不是UTF-8,那么很可能文本是UTF-8,添加BOM就是您所需要的。但是,这不是转换。这个问题经常出现,因为PHP懒散地添加BOM,但它本身也希望在输入时添加BOM。我的问题与OP略有不同,但这解决了我的问题。我没有使用file\u put\u内容,而是使用header立即下载文件。数据已经在数据库中的UTF-8中,但在CSV下载中无法使用。这很有效。谢谢。第一个示例第3行的$a变量是什么?如果使用流\u过滤器\u附加:OLD-ENCODING/UTF-8
function filesToUTF8($searchdir,$convdir,$filetypes) {
$get_files = glob($searchdir.'*{'.$filetypes.'}', GLOB_BRACE);
foreach($get_files as $file) {
$expl_path = explode('/',$file);
$filename = end($expl_path);
$get_file_content = file_get_contents($file);
$new_file_content = iconv(mb_detect_encoding($get_file_content, mb_detect_order(), true), "UTF-8", $get_file_content);
$put_new_file = file_put_contents($convdir.$filename,$new_file_content);
}
}
<?php
//
// UTF-8 编码:
//
// fnc001: save string as a file in UTF-8:
// The resulting file is UTF-8 only if $strContent is,
// with French accents, chinese ideograms, etc..
//
function entSaveAsUtf8($strContent, $filename) {
$fp = fopen($filename, "wb");
fwrite($fp, $strContent);
fclose($fp);
return True;
}
//
// 0. write UTF-8 string in fly into UTF-8 file:
//
$strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";
$filename = "utf8text.txt";
entSaveAsUtf8($strContent, $filename);
//
// 2. convert CP936 ANSI/OEM - chinese simplified GBK file into UTF-8 file:
//
$strContent = file_get_contents("cp936gbktext.txt");
$strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");
$filename = "utf8text2.txt";
entSaveAsUtf8($strContent, $filename);
?>
>>Get-Content cp936gbktext.txt
My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France 936 (ANSI/OEM - chinois simplifié GBK)
print_r(mb_list_encodings());
Array
(
[0] => pass
[1] => wchar
[2] => byte2be
[3] => byte2le
[4] => byte4be
[5] => byte4le
[6] => BASE64
[7] => UUENCODE
[8] => HTML-ENTITIES
[9] => Quoted-Printable
[10] => 7bit
[11] => 8bit
[12] => UCS-4
[13] => UCS-4BE
[14] => UCS-4LE
[15] => UCS-2
[16] => UCS-2BE
[17] => UCS-2LE
[18] => UTF-32
[19] => UTF-32BE
[20] => UTF-32LE
[21] => UTF-16
[22] => UTF-16BE
[23] => UTF-16LE
[24] => UTF-8
[25] => UTF-7
[26] => UTF7-IMAP
[27] => ASCII
[28] => EUC-JP
[29] => SJIS
[30] => eucJP-win
[31] => EUC-JP-2004
[32] => SJIS-win
[33] => SJIS-Mobile#DOCOMO
[34] => SJIS-Mobile#KDDI
[35] => SJIS-Mobile#SOFTBANK
[36] => SJIS-mac
[37] => SJIS-2004
[38] => UTF-8-Mobile#DOCOMO
[39] => UTF-8-Mobile#KDDI-A
[40] => UTF-8-Mobile#KDDI-B
[41] => UTF-8-Mobile#SOFTBANK
[42] => CP932
[43] => CP51932
[44] => JIS
[45] => ISO-2022-JP
[46] => ISO-2022-JP-MS
[47] => GB18030
[48] => Windows-1252
[49] => Windows-1254
[50] => ISO-8859-1
[51] => ISO-8859-2
[52] => ISO-8859-3
[53] => ISO-8859-4
[54] => ISO-8859-5
[55] => ISO-8859-6
[56] => ISO-8859-7
[57] => ISO-8859-8
[58] => ISO-8859-9
[59] => ISO-8859-10
[60] => ISO-8859-13
[61] => ISO-8859-14
[62] => ISO-8859-15
[63] => ISO-8859-16
[64] => EUC-CN
[65] => CP936
[66] => HZ
[67] => EUC-TW
[68] => BIG-5
[69] => CP950
[70] => EUC-KR
[71] => UHC
[72] => ISO-2022-KR
[73] => Windows-1251
[74] => CP866
[75] => KOI8-R
[76] => KOI8-U
[77] => ArmSCII-8
[78] => CP850
[79] => JIS-ms
[80] => ISO-2022-JP-2004
[81] => ISO-2022-JP-MOBILE#KDDI
[82] => CP50220
[83] => CP50220raw
[84] => CP50221
[85] => CP50222
)