C++ 为什么boost locale没有'；不提供字符级规则类型？_C++_C++11_Boost

C++ 为什么boost locale没有'；不提供字符级规则类型？

c++ c++11 boost

C++ 为什么boost locale没有'；不提供字符级规则类型？,c++,c++11,boost,C++,C++11,Boost,Env:boost1.53.0c++11 < >新到C++。在boost语言环境边界分析中，为单词（例如boundary:：word\u letter，boundary:：word\u number）和句子指定了规则类型，但没有为字符指定边界规则类型。我所需要的是类似于isUpperCase（）、isLowerCase（）、isDigit（）、isPunctuation（）的东西尝试了无效的boost字符串算法 boost::locale::generator gen; std::local

Env:boost1.53.0c++11

< >新到C++。在boost语言环境边界分析中，为单词（例如

boundary:：word\u letter

，

boundary:：word\u number

）和句子指定了规则类型，但没有为字符指定边界规则类型。我所需要的是类似于

isUpperCase（）、isLowerCase（）、isDigit（）、isPunctuation（）

的东西

尝试了无效的boost字符串算法

boost::locale::generator gen;
std::locale loc = gen("ru_RU.UTF-8");
std::string context = "ДВ";
std::cout << boost::algorithm::all(context, boost::algorithm::is_upper(loc));

boost:：locale:：generator；
std:：locale loc=gen（“ru_ru.UTF-8”）；
std:：string context=“Бb”；
std：：cout这在VS 2013下对我有效
locale::global(locale("ru-RU")); 
std::string context = "ДВ"; 
std::cout << any_of(context.begin(), context.end(), boost::algorithm::is_upper());

印刷品
true
true
false

这也适用于boost:：algorithm:：all
wstring context = L"ДВ";
wcout << boolalpha << boost::algorithm::all(context, boost::algorithm::is_upper());

wstring context=L“Бb”；
wcoutBoost.locale是基于ICU的，ICU本身提供了字符级分类，它看起来很有组成性和可读性（更多的是Java风格）
下面是一个简单的例子
#include <unicode/brkiter.h>
#include <unicode/utypes.h>
#include <unicode/uchar.h>

int main()
{
UnicodeString s("А аБ Д д2 -");
UErrorCode status = U_ERROR_WARNING_LIMIT;
Locale ru("ru", "RU");
BreakIterator* bi = BreakIterator::createCharacterInstance(ru, status);
bi->setText(s);
int32_t p = bi->first();
while(p != BreakIterator::DONE) {
    std::string type;
    if(u_isUUppercase(s.charAt(p)))
        type = "upper" ;
    if(u_isULowercase(s.charAt(p)))
        type = "lower" ;
    if(u_isUWhiteSpace(s.charAt(p)))
        type = "whitespace" ;
    if(u_isdigit(s.charAt(p)))
        type = "digit" ;
    if(u_ispunct(s.charAt(p)))
        type = "punc" ;
    printf("Boundary at position %d is %s\n", p, type.c_str());
    p= bi->next();
}
delete bi;
return 0;

#包括
#包括
#包括
int main（）
{
破坏s（“АаББа2-”；
UErrorCode status=错误警告限制；
地点ru（“ru”、“ru”）；
BreakIterator*bi=BreakIterator:：createCharacterInstance（ru，状态）；
bi->setText（s）；
int32_t p=bi->first（）；
while（p！=BreakIterator:：DONE）{
std：：字符串类型；
if（u_isUUppercase（s.charAt（p）））
type=“上”；
如果（u_isULowercase（s.charAt（p）））
type=“较低”；
if（u_isUWhiteSpace（s.charAt（p）））
type=“空白”；
如果（u_是数字（s.charAt（p）））
type=“数字”；
if（u_ispunt（s.charAt（p）））
type=“punc”；
printf（“位置%d处的边界是%s\n”，p，type.c_str（））；
p=bi->next（）；
}
删除bi；
返回0；

}
你所说的“boost字符串算法不起作用”是什么意思，你的程序崩溃了？==！它没有按预期工作。错误的结果。它只能处理ascii字母。再次感谢~哪个操作系统？源文件保存的代码页是什么？unbuntu 12.04。一切都是用utf8编码的。看看问题中的程序。与您正在尝试的非常相似，并且工作正常。只需根据你们的程序进行更改，当然要更改语言环境，看看它是否有效，最终对我有效。事实上，这是我命名实体标记项目的一部分。从句子分割、标记化到命名实体标记，整个管道在unicode字符串中统一，基本字符单元为utf8字符。据介绍，wstring在windows下是首选，但在linux下不是。无论如何，如果没有其他选择，那么就必须这样。谢谢你所做的一切！我发布了另一种方法来实现这一点。如果有兴趣，请查看。
#include <unicode/brkiter.h>
#include <unicode/utypes.h>
#include <unicode/uchar.h>

int main()
{
UnicodeString s("А аБ Д д2 -");
UErrorCode status = U_ERROR_WARNING_LIMIT;
Locale ru("ru", "RU");
BreakIterator* bi = BreakIterator::createCharacterInstance(ru, status);
bi->setText(s);
int32_t p = bi->first();
while(p != BreakIterator::DONE) {
    std::string type;
    if(u_isUUppercase(s.charAt(p)))
        type = "upper" ;
    if(u_isULowercase(s.charAt(p)))
        type = "lower" ;
    if(u_isUWhiteSpace(s.charAt(p)))
        type = "whitespace" ;
    if(u_isdigit(s.charAt(p)))
        type = "digit" ;
    if(u_ispunct(s.charAt(p)))
        type = "punc" ;
    printf("Boundary at position %d is %s\n", p, type.c_str());
    p= bi->next();
}
delete bi;
return 0;