Php 测试非UTF-8字符串_Php_Testing_Encoding

Php 测试非UTF-8字符串

php testing encoding

Php 测试非UTF-8字符串,php,testing,encoding,Php,Testing,Encoding,我读过关于这个问题的其他文章，但我不明白我做错了什么我有一个函数 public function reEncode($item) { if (! mb_detect_encoding($item, 'utf-8', true)) { $item = utf8_encode($item); } return $item; } 我正在为此写一个测试。我想测试一个不是UTF-8的字符串，看看是否命中了该语句。创建测试字符串时遇到问题 $contents =

我读过关于这个问题的其他文章，但我不明白我做错了什么

我有一个函数

public function reEncode($item)
{
    if (! mb_detect_encoding($item, 'utf-8', true)) {
        $item = utf8_encode($item);
    }

    return $item;
}

我正在为此写一个测试。我想测试一个不是

UTF-8

的字符串，看看是否命中了该语句。创建测试字符串时遇到问题

$contents = file_get_contents('CyrillicKOI8REncoded.txt');
var_dump(mb_detect_encoding($contents));

$sanitized = $this->reEncode($contents);
var_dump(mb_detect_encoding($sanitized));

最初，我在一个文件上使用了

file\u get\u contents

，该文件是我用不同的编码用sublime编码的

Cyrillic（KOI8-R）

、

HEX

和

DOS（CP 437）

，正如前面所述，

file\u get\u contents（）

忽略了文件编码。这似乎是真的，因为返回的字符乱七八糟

也就是说，每次我对这些变量使用

mb\u detect\u encoding（）

，我总是得到

ASCII

或

UTF-8

。由于

ASCII

是

UTF-8

的子集，因此从不触发该语句

因此，我尝试了

mb\u convert\u encoding（）

和

iconv（）

将基本字符串转换为

UTF-16

，

UTF-32

，

base64

，

hex

等，但每次

mb\u detect\u encoding（）

都返回

ASCII

或

UTF-8

在我的测试中，我想在调用这个函数之前和之后断言编码类型

$sanitized = $this->reEncode($contents);

$this->assertEquals('UTF-32', mb_detect_encoding($contents));
$this->assertEquals('UTF-8', mb_detect_encoding($sanitized));

我无法理解我在不断地从

mb\u detect\u encoding（）

ASCII

或

UTF-8

时犯了什么基本错误

好的，所以必须使用strict来检查，否则

mb\u detect\u encoding（）

函数几乎没有用处

$item = mb_convert_encoding('Котёнок', 'KOI8-R');

$sanitized = $this->reEncode($item);

$this->assertEquals('KOI8-R', mb_detect_encoding($item, 'KOI8-R', true));
$this->assertEquals('UTF-8', mb_detect_encoding($sanitised, 'UTF-8', true));

请注意：我没有完全阅读您的帖子：请记住，php文件的编码、文本文件、php的内部设置、数据库连接设置（如果使用）和witch以及您运行函数的方式可能会改变行为。小心UTF-X；-）我出去了