Php 按包含html实体的第一个字母拆分数组_Php_Unicode_Utf 8_Html Entities

Php 按包含html实体的第一个字母拆分数组

php unicode utf-8

Php 按包含html实体的第一个字母拆分数组,php,unicode,utf-8,html-entities,Php,Unicode,Utf 8,Html Entities,我有一个这样的国家 array(249) { [0]=> array(4) { ["country_id"]=> string(1) "2" ["country_name_en"]=> string(19) "Åland Islands" ["country_alpha2"]=> string(2) "AX" ["country_alpha3"]=> string(3) "A

我有一个这样的国家

array(249) {
  [0]=>
  array(4) {
    ["country_id"]=>
    string(1) "2"
    ["country_name_en"]=>
    string(19) "&Aring;land Islands"
    ["country_alpha2"]=>
    string(2) "AX"
    ["country_alpha3"]=>
    string(3) "ALA"
  }
  etc.
}

array(26) {
 'A' => array(10) {
    array(4) {
      ["country_id"]=>
      string(1) "2"
      ["country_name_en"]=>
      string(19) "&Aring;land Islands"
      ["country_alpha2"]=>
      string(2) "AX"
      ["country_alpha3"]=>
      string(3) "ALA"
    }
    etc.
  }
  etc.
}

我想把它按第一个字母分开，得到这样一个数组

array(249) {
  [0]=>
  array(4) {
    ["country_id"]=>
    string(1) "2"
    ["country_name_en"]=>
    string(19) "&Aring;land Islands"
    ["country_alpha2"]=>
    string(2) "AX"
    ["country_alpha3"]=>
    string(3) "ALA"
  }
  etc.
}

array(26) {
 'A' => array(10) {
    array(4) {
      ["country_id"]=>
      string(1) "2"
      ["country_name_en"]=>
      string(19) "&Aring;land Islands"
      ["country_alpha2"]=>
      string(2) "AX"
      ["country_alpha3"]=>
      string(3) "ALA"
    }
    etc.
  }
  etc.
}

但问题是，国家名称数组包含html实体作为第一个字符

有什么办法吗

提前谢谢

Peter在数组中循环，用于解码html实体，然后使用拆分

或者您可以使用jlcd建议的功能：

function substr_unicode($str, $s, $l = null) {
    return join("", array_slice(
        preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}

foreach($array as $values) {
    $values['country_name_en'] = html_entity_decode($values['country_name_en']);
    $index = substr_unicode($values['country_name_en'], 0, 1);

    $new_array[$index] = $values;
}

如果您想将

奥兰群岛

归档在

下，您需要做的工作比已经建议的要多一些

包含，一个将

Å

转换为

Å

的函数。困惑了吗？该unicode符号（U+00C5）在UTF-8中可以表示为

0xC385

（合成）和

0x41CC8A

（分解）

0x41

是

，

0xCC8A

是

因此，为了妥善归档您的岛屿，您需要执行以下操作：

$string = "&Aring;land Islands";
$s = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
$s = Normalizer::normalize($s, Normalizer::FORM_KD);
$s = mb_substr($s, 0, 1);

很可能，您的环境尚未安装。如果是这样的话，您可以研究一个函数，它可以将字符串简化为字母数字部分

有了以上这些，你应该能够

循环原始数组

提取国家名称

清理国家名称并提取第一个字符

基于（3）的特征构建新数组

注意：请注意，亚美尼亚、

奥地利

和

澳大利亚

等国家都将在

下归档

可能的重复由于这个问题需要将

Å

音译为

，我认为@Gordon关于重复的建议不合适。可能的重复请注意，当使用

mb_substr

或

substr

时，根据字符串的编码，它可能不会返回正确的结果：好的建议，添加到我的答案中！