C++11 提升精神：附加到字符串的子语法？_C++11_Boost_Boost Spirit_Boost Spirit Qi

C++11 提升精神：附加到字符串的子语法？

c++11 boost

C++11 提升精神：附加到字符串的子语法？,c++11,boost,boost-spirit,boost-spirit-qi,C++11,Boost,Boost Spirit,Boost Spirit Qi,我在玩弄。作为一项更大工作的一部分，我正在尝试构造一种语法来解析C/C++风格的字符串文本。我遇到了一个问题：如何创建一个子语法，将std:：string（）结果附加到调用语法的std:：string（）属性（而不仅仅是char 这是我的代码，到目前为止它还在工作。（实际上我已经得到了更多，包括'\n'等，但我把它缩减到了最基本的部分。） #定义BOOST_SPIRIT_UNICODE #包括 #包括 #包括使用名称空间boost；使用名称空间boost：：spirit；使用名称空间bo

我在玩弄。作为一项更大工作的一部分，我正在尝试构造一种语法来解析C/C++风格的字符串文本。我遇到了一个问题：

如何创建一个子语法，将
std:：string（）
结果附加到调用语法的
std:：string（）
属性（而不仅仅是

char

这是我的代码，到目前为止它还在工作。（实际上我已经得到了更多，包括

'\n'

等，但我把它缩减到了最基本的部分。）

#定义BOOST_SPIRIT_UNICODE
#包括
#包括
#包括
使用名称空间boost；
使用名称空间boost：：spirit；
使用名称空间boost:：spirit:：qi；
模板
结构EscapedUnicode:grammar/（（“u”>>uint_解析器）
|（“U”>>uint_解析器（））；
}
规则转义\u unicode；//
struct QuotedString:grammar
{
QuotedString（）：QuotedString:：基本类型（带引号的字符串）
{
quoted_string%='''>*（转义的_unicode |（char|-（''''''''| eol））>''；
}
转义码转义码；
规则带引号的字符串；
};
int main（）
{
std:：string input=“\”foo\u0041\”；
typedef std:：string:：const_iterator iterator_type；
QuotedString<迭代器类型>qs；
std：：字符串结果；
bool r=parse（input.cbegin（），input.cend（），qs，result）；
标准：：cout应用的几点：

请不要在高度通用的代码中使用命名空间。除非你控制它，否则会毁了你的一天

运算符%=
是自动规则分配，这意味着即使在存在语义操作的情况下，也会强制自动属性传播。您不希望这样做，因为uint\u解析器
公开的属性将不是（正确的）如果要编码为多字节字符串表示，则自动传播
输入字符串
std::string input = "\"foo\u0041\"";

需要
std::string input = "\"foo\\u0041\"";

否则，编译器会在解析器运行之前执行转义处理：）

下面是完成任务的具体技巧：

您需要将规则的声明属性更改为Spirit将在简单序列中自动“展平”的属性
quoted_string = '"' >> *(escaped_unicode | (qi::char_ - ('"' | qi::eol))) >> '"';

将不会追加，因为备用项的第一个分支生成一个字符序列，第二个分支生成一个字符序列。等效项的拼写如下：
quoted_string = '"' >> *(escaped_unicode | +(qi::char_ - ('"' | qi::eol | "\\u" | "\\U"))) >> '"';

微妙地触发精神上的附加启发，这样我们就能实现我们想要的

剩下的是直截了当的：

使用Phoenix函数对象实现实际编码：
struct encode_f {
    template <typename...> struct result { using type = void; };

    template <typename V, typename CP> void operator()(V& a, CP codepoint) const {
        // TODO implement desired encoding (e.g. UTF8)
        bio::stream<bio::back_insert_device<V> > os(a);
        os << "[" << std::hex << std::showbase << std::setw(std::numeric_limits<CP>::digits/4) << std::setfill('0') << codepoint << "]";
    }
};
boost::phoenix::function<encode_f> encode;

印刷品：
true: 'foo[0x0041][0x00000041]'

“但当然，我需要为0x7f以外的任何内容生成一个字符（字节）序列。”-你想要什么编码？作为一个注释：一般来说，当你有诸如“为什么spirit不对我的序列附加/连接/做我想做的事”这样的问题时“，看一下备忘单：我从不使用使用名称空间
，除非试图使用boost:：spirit:：qi:：uint_解析器

等将一长串显式浓缩到尽可能少的LoC；-）我想我需要一点时间来思考你答案中“简单”的部分（因为这正是菲尼克斯的东西仍然让我头疼。）但它是有效的，非常感谢你详尽的回答。老实说，我认为噪音错误很可能是由细微的其他因素引起的（比如%=）如果你表现出你所坚持的，我可以发现问题，如果你想结束的话，结果与我最初的想法（以及你在这里演示的内容）完全不同。我把它贴在了上，希望解决剩下的不太好的地方。
escaped_unicode = '\\' > ( ("u" >> uint_parser<uint16_t, 16, 4, 4>() [ encode(_val, _1) ])
                         | ("U" >> uint_parser<uint32_t, 16, 8, 8>() [ encode(_val, _1) ]) );

//#define BOOST_SPIRIT_UNICODE
//#define BOOST_SPIRIT_DEBUG

#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

// for demo re-encoding
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/stream.hpp>
#include <iomanip>

namespace qi  = boost::spirit::qi;
namespace bio = boost::iostreams;
namespace phx = boost::phoenix;

template <typename Iterator, typename Attr = std::vector<char> > // or std::string for that matter
struct EscapedUnicode : qi::grammar<Iterator, Attr()>
{
    EscapedUnicode() : EscapedUnicode::base_type(escaped_unicode)
    {
        using namespace qi;

        escaped_unicode = '\\' > ( ("u" >> uint_parser<uint16_t, 16, 4, 4>() [ encode(_val, _1) ])
                                 | ("U" >> uint_parser<uint32_t, 16, 8, 8>() [ encode(_val, _1) ]) );

        BOOST_SPIRIT_DEBUG_NODES((escaped_unicode))
    }

    struct encode_f {
        template <typename...> struct result { using type = void; };

        template <typename V, typename CP> void operator()(V& a, CP codepoint) const {
            // TODO implement desired encoding (e.g. UTF8)
            bio::stream<bio::back_insert_device<V> > os(a);
            os << "[0x" << std::hex << std::setw(std::numeric_limits<CP>::digits/4) << std::setfill('0') << codepoint << "]";
        }
    };
    boost::phoenix::function<encode_f> encode;

    qi::rule<Iterator, Attr()> escaped_unicode;
};

template <typename Iterator>
struct QuotedString : qi::grammar<Iterator, std::string()>
{
    QuotedString() : QuotedString::base_type(start)
    {
        start = quoted_string;
        quoted_string = '"' >> *(escaped_unicode | +(qi::char_ - ('"' | qi::eol | "\\u" | "\\U"))) >> '"';
        BOOST_SPIRIT_DEBUG_NODES((start)(quoted_string))
    }

    EscapedUnicode<Iterator> escaped_unicode;
    qi::rule<Iterator, std::string()> start;
    qi::rule<Iterator, std::vector<char>()> quoted_string;
};

int main() {
    std::string input = "\"foo\\u0041\\U00000041\"";

    typedef std::string::const_iterator iterator_type;
    QuotedString<iterator_type> qs;
    std::string result;
    bool r = parse( input.cbegin(), input.cend(), qs, result );
    std::cout << std::boolalpha << r << ": '" << result << "'\n";
}

true: 'foo[0x0041][0x00000041]'