C++ 如何在c+中通过属性/标识符定义unicode范围+；佩格特_C++_Utf 8_Peg

C++ 如何在c+中通过属性/标识符定义unicode范围+；佩格特

c++ utf-8

C++ 如何在c+中通过属性/标识符定义unicode范围+；佩格特,c++,utf-8,peg,C++,Utf 8,Peg,使用PEGTL（），这是一个基于模板的C++11只头PEG库，我可以定义unicode字符的范围，如下所示： utf8:：范围//所有utf8字符 utf8:：范围//utf8 0x41-0x5A[A-Z]和0x61-0x7A[A-Z] 现在有了UTF8，就有了这个属性分类（），我可以用它来做像[:Lu:]或[:ID_Start:]这样的事情，并获得一组/范围的字符现在，因为我正在使用C++模板，所以我需要编译时的那些范围。我认为我有以下选择：发现PEGTL本身有可能查找[：ID_Sta

使用PEGTL（），这是一个基于模板的C++11只头PEG库，我可以定义unicode字符的范围，如下所示：

utf8:：范围//所有utf8字符
utf8:：范围//utf8 0x41-0x5A[A-Z]和0x61-0x7A[A-Z]

现在有了UTF8，就有了这个属性分类（），我可以用它来做像[:Lu:]或[:ID_Start:]这样的事情，并获得一组/范围的字符

现在，因为我正在使用C++模板，所以我需要编译时的那些范围。我认为我有以下选择：

发现PEGTL本身有可能查找[：ID_Start:]或[：Lu:]

查找C++预处理器库，允许在编译时这样的查询

获取一个应用程序/在线服务，在这里我可以执行这些查询并获取范围（如上所示），然后我可以将其粘贴到我的代码中

这也代表了我喜欢的解决方案的顺序。

使用规则匹配字符，而不是返回字符集。如果希望将字符与某些Unicode字符属性相匹配，可以创建一个Unicode库，并在某些Unicode库的帮助下实现它，例如。它提供了测试各种属性的代码点的方法，请参阅

下面是一个完整的示例程序：

#include <iomanip>
#include <iostream>

#include <unicode/uchar.h>

#include <tao/pegtl.hpp>

using namespace tao::TAO_PEGTL_NAMESPACE;  // NOLINT

namespace test
{
   template< UProperty P >
   struct icu_has_binary_property
   {
      using analyze_t = analysis::generic< analysis::rule_type::ANY >;

      template< typename Input >
      static bool match( Input& in )
      {
         // this assumes the input is UTF8, adapt as necessary
         const auto r = internal::peek_utf8::peek( in );
         // if a code point is available, the size is >0
         if( r.size != 0 ) {
            // check the property
            if( u_hasBinaryProperty( r.data, P ) ) {
               // if it matches, consume the character
               in.bump( r.size );
               return true;
            }
         }
         return false;
      }
   };

   using icu_lower = icu_has_binary_property< UCHAR_LOWERCASE >;
   using icu_upper = icu_has_binary_property< UCHAR_UPPERCASE >;

   // clang-format off
   struct grammar : seq< icu_upper, plus< icu_lower >, eof > {};
   // clang-format on
}

int main( int argc, char** argv )
{
   for( int i = 1; i < argc; ++i ) {
      argv_input<> in( argv, i );
      std::cout << argv[ i ] << " matches: " << std::boolalpha << parse< test::grammar >( in ) << std::endl;
   }
}

编辑：我已经在PEGTL中添加了（很多）。因为他们需要ICU，一种外部依赖，我把他们放在

contrib

-部分