C 使用非标准字符遍历字符数组_C_Arrays

C 使用非标准字符遍历字符数组

c arrays

C 使用非标准字符遍历字符数组,c,arrays,C,Arrays,编辑：我只能使用stdio.h和stdlib.h 我想遍历一个充满字符的字符数组然而，像ä，ö这样的字符占用两倍的空间并使用两个元素。这就是我的问题所在，我不知道如何访问这些特殊字符在我的示例中，字符“ä”将使用hmm[0]和hmm[1] #include <stdio.h> #include <stdlib.h> #include <string.h> int main() { char* hmm = "äö"; printf("%c\n

编辑：

我只能使用stdio.h和stdlib.h

我想遍历一个充满字符的字符数组

然而，像ä，ö这样的字符占用两倍的空间并使用两个元素。这就是我的问题所在，我不知道如何访问这些特殊字符

在我的示例中，字符“ä”将使用hmm[0]和hmm[1]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
  char* hmm = "äö";

  printf("%c\n", hmm[0]); //i want to print "ä"

  printf("%i\n", strlen(hmm));

  return 0;
}

#包括
#包括
#包括
int main（）
{
char*hmm=“äö”；
printf（“%c\n”，hmm[0]）；//我想打印“ä”
printf（“%i\n”，strlen（hmm））；
返回0；
}

谢谢，我试着在Eclipse中运行我的附加代码，它可以工作了。我假设是因为它使用64位，“ä”有足够的空间容纳。斯特伦确认，每个“ä”只算作一个元素。所以我想我可以告诉它为每个字符分配更多的空间（这样“ä”就可以容纳）

#包括
#包括
int main（）
{
char*hmm=“äüö”；
printf（“%c\n”，hmm[0]）；
printf（“%c\n”，hmm[1]）；
printf（“%c\n”，hmm[2]）；
返回0；
}

字符始终使用一个字节

在你的例子中，你认为“ä”是一个字符：错。用十六进制查看器打开.c源代码，您将看到ä使用2个字符，因为文件是用UTF8编码的

现在的问题是，是否要使用宽字符

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int main()
{
    const wchar_t hmm[] = L"äö";

    setlocale(LC_ALL, "");
    wprintf(L"%ls\n", hmm);
    wprintf(L"%lc\n", hmm[0]);
    wprintf(L"%i\n", wcslen(hmm));

    return 0;
}

#包括
#包括
#包括
#包括
int main（）
{
常量wchar_t hmm[]=L“äö”；
setlocale（LC_ALL，“”）；
wprintf（L“%ls\n”，hmm）；
wprintf（L“%lc\n”，hmm[0]）；
wprintf（L“%i\n”，wcslen（hmm））；
返回0；
}

我会检查您的命令提示符字体/代码页，确保它可以显示您的操作系统单字节编码。注意：命令提示符有自己的代码页，与文本编辑器不同。

您的数据采用多字节编码。因此，您需要使用多字节字符处理技术来分割字符串。例如：

#include <stdio.h>
#include <string.h>
#include <locale.h>

int main(void)
{
    char* hmm = "äö";
    int off = 0;
    int len;
    int max = strlen(hmm);

    setlocale(LC_ALL, "");

    printf("<<%s>>\n", hmm);
    printf("%zi\n", strlen(hmm));

    while (hmm[off] != '\0' && (len = mblen(&hmm[off], max - off)) > 0)
    {
        printf("<<%.*s>>\n", len, &hmm[off]);
        off += len;
    }

    return 0;
}

出现问号是因为打印的字节对于UTF-8终端而言是无效的单字节

您还可以使用宽字符和宽字符打印，如中所示。

抱歉，请继续。尽管我认为强调一些问题很重要。据我所知，OS-X能够将默认操作系统代码页设置为UTF-8，因此答案主要是关于引擎盖下使用UTF-16的窗口，其默认ACP代码页取决于指定的操作系统区域

首先，您可以打开角色映射，并找到
äö

两者都位于代码页1252（西部），因此这不是MBCS问题。唯一可能出现MBCS问题的方法是使用MBCS（Shift-JIS、Big5、Korean、GBK）编码保存文件

使用
setlocale（LC_ALL，“”）

无法深入了解在命令提示窗口中错误呈现äö的原因

命令提示符使用自己的代码页，即OEM代码页。是对以下（OEM）代码页的引用，这些代码页可与角色映射一起使用

进入命令提示符并键入以下命令将显示命令提示符正在使用的当前OEM代码页

在使用setlocal（LC_ALL，“”）的Microsoft文档之后，它详细说明了以下行为

setlocale（LC_ALL，“”）
将区域设置设置为默认值，即从操作系统获得的用户默认ANSI代码页

您可以手动执行此操作，方法是使用chcp并传递所需的代码页，然后运行您的应用程序，它将完美地输出文本

如果是多字节字符集问题，则会有一系列其他问题：

在MBCS下，字符编码为一个或两个字节。在双字节字符中，第一个或“前导字节”表示它和下一个字节都将被解释为一个字符。第一个字节来自一系列保留用作前导字节的代码。哪些字节范围可以是前导字节取决于使用的代码页。例如，日文代码页932使用范围0x81到0x9F作为前导字节，而韩文代码页949使用不同的范围

看看情况，长度是4而不是2。我想说的是，文件格式已经保存在UTF-8中（事实上，它可以保存在UTF-16中，尽管编译器迟早会遇到问题）。如果使用的字符不在0到127的ASCII范围内，UTF-8将Unicode代码点编码为两个字节。您的编译器正在打开该文件，并假定其为默认操作系统代码页或ANSI C。解析字符串时，它将字符串解释为ANSI C字符串1字节=1个字符

要解决此问题，请在windows下将UTF-8字符串转换为UTF-16并使用wprintf打印。目前，没有对Ascii/MBCS stdio函数的本机UTF-8支持

对于MacOS-X，默认操作系统代码页为UTF-8，那么我建议采用Jonathan Leffler的解决方案，因为它更优雅。不过，如果稍后将其移植到Windows，您会发现需要使用下面的示例将字符串从UTF-8转换为UTF-16

在这两种解决方案中，您仍然需要将命令提示代码页更改为操作系统代码页，以正确打印ASCII以上的字符

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <Windows.h>
#include <locale>

// File saved as UTF-8, with characters outside the ASCII range
int main()
{
    // Set the OEM code page to be the default OS code page
    setlocale(LC_ALL, "");

    // äö reside outside of the ASCII range and in the Unicode code point Western Latin 1
    // Thus, requires a lead byte per unicode code point when saving as UTF-8
    char* hmm = "äö";

    printf("UTF-8 file string using Windows 1252 code page read as:%s\n",hmm);
    printf("Length:%d\n", strlen(hmm));

    // Convert the UTF-8 String to a wide character
    int nLen = MultiByteToWideChar(CP_UTF8, 0,hmm, -1, NULL, NULL);
    LPWSTR lpszW = new WCHAR[nLen];
    MultiByteToWideChar(CP_UTF8, 0, hmm, -1, lpszW, nLen);

    // Print it
    wprintf(L"wprintf wide character of UTF-8 string: %s\n", lpszW); 

    // Free the memory
    delete[] lpszW;

    int c = getchar();
    return 0;
}


UTF-8 file string using Windows 1252 code page read as:Ã¤Ã¶
Length:4
wprintf wide character of UTF-8 string: äö

#包括
#包括
#包括
#包括
#包括
//文件另存为UTF-8，字符超出ASCII范围
int main（）
{
//将OEM代码页设置为默认操作系统代码页
setlocale（LC_ALL，“”）；
//äö位于ASCII范围之外，位于Unicode代码点Western Latin 1中
//因此，在保存为UTF-8时，每个unicode代码点需要一个前导字节
char*hmm=“äö”；
printf（“使用Windows 1252代码页的UTF-8文件字符串读作：%s\n”，hmm）；
printf（“长度：%d\n”，strlen（hmm））；
//将UTF-8字符串转换为宽字符
我
<<äö>>
4
<<ä>>
<<ö>>

<<äö>>
4
<<?>>
<<?>>
<<?>>
<<?>>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <Windows.h>
#include <locale>

// File saved as UTF-8, with characters outside the ASCII range
int main()
{
    // Set the OEM code page to be the default OS code page
    setlocale(LC_ALL, "");

    // äö reside outside of the ASCII range and in the Unicode code point Western Latin 1
    // Thus, requires a lead byte per unicode code point when saving as UTF-8
    char* hmm = "äö";

    printf("UTF-8 file string using Windows 1252 code page read as:%s\n",hmm);
    printf("Length:%d\n", strlen(hmm));

    // Convert the UTF-8 String to a wide character
    int nLen = MultiByteToWideChar(CP_UTF8, 0,hmm, -1, NULL, NULL);
    LPWSTR lpszW = new WCHAR[nLen];
    MultiByteToWideChar(CP_UTF8, 0, hmm, -1, lpszW, nLen);

    // Print it
    wprintf(L"wprintf wide character of UTF-8 string: %s\n", lpszW); 

    // Free the memory
    delete[] lpszW;

    int c = getchar();
    return 0;
}


UTF-8 file string using Windows 1252 code page read as:Ã¤Ã¶
Length:4
wprintf wide character of UTF-8 string: äö