Php 如何以UTF-8格式写入文件?

Php 如何以UTF-8格式写入文件?,php,encoding,utf-8,iconv,mbstring,Php,Encoding,Utf 8,Iconv,Mbstring,我有很多不是UTF-8编码的文件,我正在将一个站点转换为UTF-8编码 我正在使用简单脚本保存我想以utf-8格式保存的文件,但这些文件是以旧编码保存的: header('Content-type: text/html; charset=utf-8'); mb_internal_encoding('UTF-8'); $fpath="folder"; $d=dir($fpath); while (False !== ($a = $d->read())) { if ($a != '.'

我有很多不是UTF-8编码的文件,我正在将一个站点转换为UTF-8编码

我正在使用简单脚本保存我想以utf-8格式保存的文件,但这些文件是以旧编码保存的:

header('Content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');
$fpath="folder";
$d=dir($fpath);
while (False !== ($a = $d->read()))
 {

 if ($a != '.' and $a != '..')
  {

  $npath=$fpath.'/'.$a;

  $data=file_get_contents($npath);

  file_put_contents('tempfolder/'.$a, $data);

  }

 }

如何以utf-8编码保存文件?

文件获取内容/文件放置内容不会神奇地转换编码

您必须显式地转换字符串;例如使用或

试试这个:

$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/'.$a, $data);
或者,使用PHP的流过滤器:

$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));

营救。

在Unix/Linux上,也可以使用一个简单的shell命令来转换给定目录中的所有文件:

 recode L1..UTF8 dir/*
也可以通过PHPs exec()启动。

添加BOM:UTF-8

file_put_contents($myFile, "\xEF\xBB\xBF".  $content); 

如果要递归使用recode并对类型进行筛选,请尝试以下操作:

find . -name "*.html" -exec recode L1..UTF8 {} \;
这对我有用。:)

我从你那儿收到这封信

  • 在windows笔记本中打开文件
  • 将编码更改为UTF-8编码
  • 保存您的文件
  • 再试一次!:O)

  • 我将所有这些放在一起,得到了将ANSI文本文件转换为“UTF-8无标记”的简单方法:


    用法:filesToUTF8('C:/Temp/','C:/Temp/conv_files/','php,txt')

    这是一个非常有用的问题。我认为我在Windows10PHP7上的解决方案对于那些还存在UTF-8转换问题的人来说非常有用

    这是我的步骤。调用以下函数(此处名为utfsave.PHP的PHP脚本本身必须具有UTF-8编码,这可以通过UltraEdit上的转换轻松完成

    在utfsave.php中,我们定义了一个调用php fopen($filename,“wb”)的函数,也就是说,它是在w写入模式下打开的,尤其是在b写入模式下

    <?php
    //
    //  UTF-8 编码:
    //
    // fnc001: save string as a file in UTF-8:
    // The resulting file is UTF-8 only if $strContent is,
    // with French accents, chinese ideograms, etc..
    //
    function entSaveAsUtf8($strContent, $filename) {
      $fp = fopen($filename, "wb"); 
      fwrite($fp, $strContent);
      fclose($fp);
      return True;
    }
    
    //
    // 0. write UTF-8 string in fly into UTF-8 file:
    //
    $strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";
    
    $filename = "utf8text.txt";
    
    entSaveAsUtf8($strContent, $filename);
    
    
    //
    // 2. convert CP936 ANSI/OEM - chinese simplified GBK file into UTF-8 file:
    //
    $strContent = file_get_contents("cp936gbktext.txt");
    $strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");
    
    
    $filename = "utf8text2.txt";
    
    entSaveAsUtf8($strContent, $filename);
    
    ?>
    
    在Windows 10 php上运行utf8save.php,从而创建utf8text.txtutf8text2.txt文件将自动以UTF-8格式保存

    使用此方法,不需要BOM字符。BOM解决方案很糟糕,因为当我们为MySQL寻找sql文件时,它会带来麻烦

    值得注意的是,我没能制作出文件内容($filename,utf8_encode($mystring))为此目的

    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    如果您不知道源文件的编码,可以使用PHP列出编码:

    print_r(mb_list_encodings());
    
    这给出了如下列表:

    Array
    (
      [0] => pass
      [1] => wchar
      [2] => byte2be
      [3] => byte2le
      [4] => byte4be
      [5] => byte4le
      [6] => BASE64
      [7] => UUENCODE
      [8] => HTML-ENTITIES
      [9] => Quoted-Printable
      [10] => 7bit
      [11] => 8bit
      [12] => UCS-4
      [13] => UCS-4BE
      [14] => UCS-4LE
      [15] => UCS-2
      [16] => UCS-2BE
      [17] => UCS-2LE
      [18] => UTF-32
      [19] => UTF-32BE
      [20] => UTF-32LE
      [21] => UTF-16
      [22] => UTF-16BE
      [23] => UTF-16LE
      [24] => UTF-8
      [25] => UTF-7
      [26] => UTF7-IMAP
      [27] => ASCII
      [28] => EUC-JP
      [29] => SJIS
      [30] => eucJP-win
      [31] => EUC-JP-2004
      [32] => SJIS-win
      [33] => SJIS-Mobile#DOCOMO
      [34] => SJIS-Mobile#KDDI
      [35] => SJIS-Mobile#SOFTBANK
      [36] => SJIS-mac
      [37] => SJIS-2004
      [38] => UTF-8-Mobile#DOCOMO
      [39] => UTF-8-Mobile#KDDI-A
      [40] => UTF-8-Mobile#KDDI-B
      [41] => UTF-8-Mobile#SOFTBANK
      [42] => CP932
      [43] => CP51932
      [44] => JIS
      [45] => ISO-2022-JP
      [46] => ISO-2022-JP-MS
      [47] => GB18030
      [48] => Windows-1252
      [49] => Windows-1254
      [50] => ISO-8859-1
      [51] => ISO-8859-2
      [52] => ISO-8859-3
      [53] => ISO-8859-4
      [54] => ISO-8859-5
      [55] => ISO-8859-6
      [56] => ISO-8859-7
      [57] => ISO-8859-8
      [58] => ISO-8859-9
      [59] => ISO-8859-10
      [60] => ISO-8859-13
      [61] => ISO-8859-14
      [62] => ISO-8859-15
      [63] => ISO-8859-16
      [64] => EUC-CN
      [65] => CP936
      [66] => HZ
      [67] => EUC-TW
      [68] => BIG-5
      [69] => CP950
      [70] => EUC-KR
      [71] => UHC
      [72] => ISO-2022-KR
      [73] => Windows-1251
      [74] => CP866
      [75] => KOI8-R
      [76] => KOI8-U
      [77] => ArmSCII-8
      [78] => CP850
      [79] => JIS-ms
      [80] => ISO-2022-JP-2004
      [81] => ISO-2022-JP-MOBILE#KDDI
      [82] => CP50220
      [83] => CP50220raw
      [84] => CP50221
      [85] => CP50222
    )
    

    如果猜不到,请逐个尝试,因为mb_detect_encoding()无法轻松完成此任务。

    不知道此命令。谢谢我使用Linux作为工作站,我所有的本地服务器都在Linux上。L1.是什么意思。。命令中的意思是?@Starmaster:L1是源字符集拉丁文-1的简写。我试图创建一个php下载脚本,以便将UTF-8用于丹麦字符,这是它所缺少的,tyIt也适用于UTF-16,但使用该字节:fwrite($f,pack(“CC”,0xff,0xfe));这对我来说是有效的,下载用utf编码的aspx页面到windows平台上。这应该是公认的答案。。。又短又甜,很管用!创建一个被识别为UTF-8的文件和转换该文件的内容是有区别的。没有特殊字符的纯文本文件具有与没有BOM的UTF-8相同的内容,并且可能正在处理文本的解析器具有编码选项。PHP本身使用UTF-8,所以如果您看到文本OK但文件似乎不是UTF-8,那么很可能文本是UTF-8,添加BOM就是您所需要的。但是,这不是转换。这个问题经常出现,因为PHP懒散地添加BOM,但它本身也希望在输入时添加BOM。我的问题与OP略有不同,但这解决了我的问题。我没有使用file\u put\u内容,而是使用header立即下载文件。数据已经在数据库中的UTF-8中,但在CSV下载中无法使用。这很有效。谢谢。第一个示例第3行的$a变量是什么?如果使用流\u过滤器\u附加:OLD-ENCODING/UTF-8
    function filesToUTF8($searchdir,$convdir,$filetypes) {
      $get_files = glob($searchdir.'*{'.$filetypes.'}', GLOB_BRACE);
      foreach($get_files as $file) {
        $expl_path = explode('/',$file);
        $filename = end($expl_path);
        $get_file_content = file_get_contents($file);
        $new_file_content = iconv(mb_detect_encoding($get_file_content, mb_detect_order(), true), "UTF-8", $get_file_content);
        $put_new_file = file_put_contents($convdir.$filename,$new_file_content);
      }
    }
    
    <?php
    //
    //  UTF-8 编码:
    //
    // fnc001: save string as a file in UTF-8:
    // The resulting file is UTF-8 only if $strContent is,
    // with French accents, chinese ideograms, etc..
    //
    function entSaveAsUtf8($strContent, $filename) {
      $fp = fopen($filename, "wb"); 
      fwrite($fp, $strContent);
      fclose($fp);
      return True;
    }
    
    //
    // 0. write UTF-8 string in fly into UTF-8 file:
    //
    $strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";
    
    $filename = "utf8text.txt";
    
    entSaveAsUtf8($strContent, $filename);
    
    
    //
    // 2. convert CP936 ANSI/OEM - chinese simplified GBK file into UTF-8 file:
    //
    $strContent = file_get_contents("cp936gbktext.txt");
    $strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");
    
    
    $filename = "utf8text2.txt";
    
    entSaveAsUtf8($strContent, $filename);
    
    ?>
    
    >>Get-Content cp936gbktext.txt
    My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France 936 (ANSI/OEM - chinois simplifié GBK)
    
    print_r(mb_list_encodings());
    
    Array
    (
      [0] => pass
      [1] => wchar
      [2] => byte2be
      [3] => byte2le
      [4] => byte4be
      [5] => byte4le
      [6] => BASE64
      [7] => UUENCODE
      [8] => HTML-ENTITIES
      [9] => Quoted-Printable
      [10] => 7bit
      [11] => 8bit
      [12] => UCS-4
      [13] => UCS-4BE
      [14] => UCS-4LE
      [15] => UCS-2
      [16] => UCS-2BE
      [17] => UCS-2LE
      [18] => UTF-32
      [19] => UTF-32BE
      [20] => UTF-32LE
      [21] => UTF-16
      [22] => UTF-16BE
      [23] => UTF-16LE
      [24] => UTF-8
      [25] => UTF-7
      [26] => UTF7-IMAP
      [27] => ASCII
      [28] => EUC-JP
      [29] => SJIS
      [30] => eucJP-win
      [31] => EUC-JP-2004
      [32] => SJIS-win
      [33] => SJIS-Mobile#DOCOMO
      [34] => SJIS-Mobile#KDDI
      [35] => SJIS-Mobile#SOFTBANK
      [36] => SJIS-mac
      [37] => SJIS-2004
      [38] => UTF-8-Mobile#DOCOMO
      [39] => UTF-8-Mobile#KDDI-A
      [40] => UTF-8-Mobile#KDDI-B
      [41] => UTF-8-Mobile#SOFTBANK
      [42] => CP932
      [43] => CP51932
      [44] => JIS
      [45] => ISO-2022-JP
      [46] => ISO-2022-JP-MS
      [47] => GB18030
      [48] => Windows-1252
      [49] => Windows-1254
      [50] => ISO-8859-1
      [51] => ISO-8859-2
      [52] => ISO-8859-3
      [53] => ISO-8859-4
      [54] => ISO-8859-5
      [55] => ISO-8859-6
      [56] => ISO-8859-7
      [57] => ISO-8859-8
      [58] => ISO-8859-9
      [59] => ISO-8859-10
      [60] => ISO-8859-13
      [61] => ISO-8859-14
      [62] => ISO-8859-15
      [63] => ISO-8859-16
      [64] => EUC-CN
      [65] => CP936
      [66] => HZ
      [67] => EUC-TW
      [68] => BIG-5
      [69] => CP950
      [70] => EUC-KR
      [71] => UHC
      [72] => ISO-2022-KR
      [73] => Windows-1251
      [74] => CP866
      [75] => KOI8-R
      [76] => KOI8-U
      [77] => ArmSCII-8
      [78] => CP850
      [79] => JIS-ms
      [80] => ISO-2022-JP-2004
      [81] => ISO-2022-JP-MOBILE#KDDI
      [82] => CP50220
      [83] => CP50220raw
      [84] => CP50221
      [85] => CP50222
    )