将字符串（带有样式标识符）转换为javascript对象表示形式_Javascript

将字符串（带有样式标识符）转换为javascript对象表示形式

javascript

将字符串（带有样式标识符）转换为javascript对象表示形式,javascript,Javascript,我需要创建一个字符串的Javascript对象表示，其中包括样式信息。样式标识符并不重要，但为了解决此问题，让我们使用stackoverflow使用的标识符： *text* = italic **text** = bold ***text*** = bold italic 我想创建的数据表示是一个对象数组，按它们在字符串中的显示顺序排列，每个对象如下所示： { stringpart : (string), style : (normal | bold | ita

我需要创建一个字符串的Javascript对象表示，其中包括样式信息。样式标识符并不重要，但为了解决此问题，让我们使用stackoverflow使用的标识符：

    *text* = italic
    **text** = bold
    ***text*** = bold italic

我想创建的数据表示是一个对象数组，按它们在字符串中的显示顺序排列，每个对象如下所示：

{
  stringpart : (string),
  style : (normal | bold | italic | bold italic)
}

因此，给定以下字符串：

This is some example text, with some **bold** and *italic* ***styles***.

应转换为以下对象数组：

[
    {
      stringpart : "This is some example text, with some ",
      style : "normal"
    },
    {
      stringpart : "bold",
      style : "bold"
    },
    {
      stringpart : " and ",
      style : "regular"
    },
    {
      stringpart : "italic",
      style : "italic"
    },
    {
      stringpart : " ",
      style : "normal"
    },
    {
      stringpart : "styles",
      style : "bold italic"
    },
    {
      stringpart : ".",
      style : "normal"
    }
]

到目前为止，我已经开始研究html解析器，并遇到了以下代码：

var
    content = 'This is some <b>really important <i>text</i></b> with <i>some <b>very very <br>very important</b> things</i> in it.',
    tagPattern = /<\/?(i|b)\b[^>]*>/ig,
    stack = [],
    tags = [],
    offset = 0,
    match,
    tag;

while (match = tagPattern.exec(content)) {
    if (match[0].substr(1, 1) !== '/') {
        stack.push(match.index - offset);
    } else {
        tags.push({
            tag: match[1],
            from: stack.splice(-1, 1)[0],
            to: match.index - offset
        });
    }
    offset += match[0].length;
}
content = content.replace(tagPattern, '');
// now use tags array and perform needed actions.

// see stuff
console.log(tags);
console.log(content);
//example of correct result
console.log(content.substring(tags[3].from, tags[3].to));

var
内容='这是一些非常重要的文本，其中包含一些非常重要的内容。'，
tagPattern=/]*>/ig，
堆栈=[]，
标签=[]，
偏移量=0，
匹配，
标签；
while（match=tagPattern.exec（content））{
如果（匹配[0]。子字符串（1，1）！='/'））{
stack.push（match.index-offset）；
}否则{
标签。推({
标记：匹配[1]，
from:stack.splice（-1,1）[0]，
到：match.index-偏移量
});
}
偏移量+=匹配[0]。长度；
}
content=content.replace（标记模式“”）；
//现在使用标记数组并执行所需的操作。
//看东西
控制台日志（标签）；
控制台日志（内容）；
//正确结果示例
console.log（content.substring（标记[3]。from，标记[3]。to））；

虽然这段代码中的正则表达式可以用来检测上面提到的样式标识符，但它不会以所需的格式输出数据，因为它只是从索引返回

如何使用上述标识符有效地将字符串转换为所需的数组/对象表示形式？

您所说的不是JSON吗？

有很多JSON解析库可用。检查它们或清楚地张贴您的要求。很明显，我指的是你想要完成它的语言/平台，目的是什么（只是为了得到一个想法）。

我想这会让你走得很远

var str = "This is some example text, with some **bold** and *italic* ***styles***."
str.match(/(\*{1,3})[^*]+(\1)/g);

输出

[ '**bold**',
  '*italic*',
  '***styles***' ]

[ { stringpart: 'This is some example text, with some ',
    style: 'normal' },
  { stringpart: 'bold', style: 'bold' },
  { stringpart: ' and ', style: 'normal' },
  { stringpart: 'italic', style: 'bold' },
  { stringpart: ' ', style: 'normal' },
  { stringpart: 'styles',
    style: 'bold italic' },
  { stringpart: '.', style: 'normal' } ]

使用

\1

的便利之处在于，您将能够匹配

对。也就是说，单个

将查找下一个

，而双精度

***

将查找下一个双精度，以此类推

我本来不打算这么做的，但我觉得有点无聊

var getStyleTokens = function(str) {

  var parts = [];

  var addNode = function(text, style) {
    return parts.push(
      {stringpart: text, style: style}
    );
  };

  var styles = {
    "*":   "italic",
    "**":  "bold",
    "***": "bold italic"
  };

  var re = /(\*{1,3})([^*]+)(?:\1)/g,
      caret = 0,
      match;

  while ((match = re.exec(str)) !== null) {
    console.log(match);
    addNode(str.substr(caret, match.index), "normal")
    addNode(match[2], styles[match[1]]);
    caret = match.index + match[0].length;
  };

  addNode(str.substr(caret), "normal");

  return parts;
};

var str = "This is some example text, with some **bold** and *italic* ***styles***."

getStyleTokens(str);

输出

[ '**bold**',
  '*italic*',
  '***styles***' ]

[ { stringpart: 'This is some example text, with some ',
    style: 'normal' },
  { stringpart: 'bold', style: 'bold' },
  { stringpart: ' and ', style: 'normal' },
  { stringpart: 'italic', style: 'bold' },
  { stringpart: ' ', style: 'normal' },
  { stringpart: 'styles',
    style: 'bold italic' },
  { stringpart: '.', style: 'normal' } ]

注意
由于您的标记不可能都是
*
，因此最好在第一个捕获组中编写一个可能的标记列表。但是，这意味着其余的RegExp也会发生变化

/(\*|\*\*|\*\*\*)(?:.(?!\1))+.(\1)/

这意味着你可以写一些

/(BOLD|ITALIC|BOTH)(?:.(?!\1))+.(\1)/

在这样的字符串上工作
这是一些示例文本，带有粗体和斜体两种样式

总之：修改上述表达式以使用您喜欢的标记；只要你使用对称的结束标记，样式就会被很好地解析。
这对于正则表达式来说绝对有用，谢谢。嗯，这实际上非常有用，我可以从中得到我需要的东西。我会把这个问题留一段时间，以防有人能提供一个完整的解决方案。如果不是，我会给你答案。再次感谢：）太棒了！一个严肃、深入、简洁的回答。非常感谢！：-）@用户2616246，我对上面的帖子做了一些改进。特别是
getStyleTokens
code示例。