Qt-Regexp-extract<；p>；Html字符串中的标记_Html_Regex_Qt_Extract_Qregexp

Qt-Regexp-extract<；p>；Html字符串中的标记

html regex qt

Qt-Regexp-extract<；p>；Html字符串中的标记,html,regex,qt,extract,qregexp,Html,Regex,Qt,Extract,Qregexp,我有一个RichText，我将它的Html源代码从QTextEdit存储在一个字符串中。我想做的是逐个提取所有行（我有4-6行）。字符串如下所示： //html opening stuff <p style = attributes...><span style = attributes...>My Text</span></p> //more lines like this //html closing stuff //html打开的东西

我有一个RichText，我将它的Html源代码从QTextEdit存储在一个字符串中。我想做的是逐个提取所有行（我有4-6行）。字符串如下所示：

//html opening stuff
<p style = attributes...><span style = attributes...>My Text</span></p>
//more lines like this
//html closing stuff

//html打开的东西
我的文本
//更多像这样的台词
//html结束语

所以我需要从开始的p标签到结束的p标签的整行代码（也包括p标签）。我检查并尝试了在这里和其他网站上找到的所有东西，但仍然没有结果

这是我的代码（“htmlStyle”是输入字符串）：

QStringList列表；
QRegExp rx（“（]*>.*？）”；
int pos=0；
而（（pos=rx.indexIn（htmlStyle，pos））！=-1）{
listHTML/XML不是一种常规语法。您无法使用正则表达式对其进行解析。请参见示例。解析HTML并非易事
您可以使用QTextDocument
、QTextBlock
、QTextCursor
等来迭代富格文本文档中的段落。所有HTML解析都由您负责。这正是QTextDocument
支持的HTML子集：它使用QTextDocument
作为内部表示。您可以使用QTextEdit:：document（）
直接从小部件获取它。例如：
void iterate(QTextEdit * edit) {
   auto const & doc = *edit->document();
   for (auto block = doc.begin(); block != doc.end(); block.next()) {
      // do something with text block e.g. iterate its fragments
      for (auto fragment = block.begin(); fragment != block.end(); fragment++) {
         // do something with text fragment
      }
   }
}

不要手动错误地解析HTML，您应该探索QTextDocument
的结构，并根据需要使用它。
HTML/XML不是一种常规语法。您不能用正则表达式解析它。例如，请参阅。解析HTML不是一件小事
您可以使用QTextDocument
、QTextBlock
、QTextCursor
等来迭代富格文本文档中的段落。所有HTML解析都由您负责。这正是QTextDocument
支持的HTML子集：它使用QTextDocument
作为内部表示。您可以使用QTextEdit:：document（）
直接从小部件获取它。例如：
void iterate(QTextEdit * edit) {
   auto const & doc = *edit->document();
   for (auto block = doc.begin(); block != doc.end(); block.next()) {
      // do something with text block e.g. iterate its fragments
      for (auto fragment = block.begin(); fragment != block.end(); fragment++) {
         // do something with text fragment
      }
   }
}

不要手动错误地解析HTML，您应该探索QTextDocument的结构，并根据需要使用它。
以下是纯java方式，希望这对您有所帮助：
int startIndex = htmlStyle.indexOf("<p>");
        int endIndex = htmlStyle.indexOf("</p>");
        while (startIndex >= 0) {
            endIndex = endIndex + 4;// to include </p> in the substring
            System.out.println(htmlStyle.substring(startIndex, endIndex));
            startIndex = htmlStyle.indexOf("<p>", startIndex + 1);
            endIndex = htmlStyle.indexOf("</p>", endIndex + 1);
        }

intstartindex=htmlStyle.indexOf（“”）；
int endIndex=htmlStyle.indexOf（“”）；
而（startIndex>=0）{
endIndex=endIndex+4；//将包含在子字符串中
System.out.println（htmlStyle.substring（startIndex，endIndex））；
startIndex=htmlStyle.indexOf（“”，startIndex+1）；
endIndex=htmlStyle.indexOf（“”，endIndex+1）；
}
以下是纯java方式，希望这对您有所帮助：
int startIndex = htmlStyle.indexOf("<p>");
        int endIndex = htmlStyle.indexOf("</p>");
        while (startIndex >= 0) {
            endIndex = endIndex + 4;// to include </p> in the substring
            System.out.println(htmlStyle.substring(startIndex, endIndex));
            startIndex = htmlStyle.indexOf("<p>", startIndex + 1);
            endIndex = htmlStyle.indexOf("</p>", endIndex + 1);
        }

intstartindex=htmlStyle.indexOf（“”）；
int endIndex=htmlStyle.indexOf（“”）；
而（startIndex>=0）{
endIndex=endIndex+4；//将包含在子字符串中
System.out.println（htmlStyle.substring（startIndex，endIndex））；
startIndex=htmlStyle.indexOf（“”，startIndex+1）；
endIndex=htmlStyle.indexOf（“”，endIndex+1）；
}
对于那些需要完整Qt解决方案的人，我根据@Aditya Poorna的答案找到了答案。谢谢你的提示
代码如下：
int startIndex = htmlStyle.indexOf("<p");
int endIndex = htmlStyle.indexOf("</p>");

while (startIndex >= 0) {
    endIndex = endIndex + 4;
    QStringRef subString(&htmlStyle, startIndex, endIndex-startIndex);
    qDebug() << subString;
    startIndex = htmlStyle.indexOf("<p", startIndex + 1);
    endIndex = htmlStyle.indexOf("</p>", endIndex + 1);
}

int startIndex=htmlStyle.indexOf（“对于那些需要完整Qt解决方案的人，我根据@Aditya Poorna的答案找到了答案。谢谢你的提示
代码如下：
int startIndex = htmlStyle.indexOf("<p");
int endIndex = htmlStyle.indexOf("</p>");

while (startIndex >= 0) {
    endIndex = endIndex + 4;
    QStringRef subString(&htmlStyle, startIndex, endIndex-startIndex);
    qDebug() << subString;
    startIndex = htmlStyle.indexOf("<p", startIndex + 1);
    endIndex = htmlStyle.indexOf("</p>", endIndex + 1);
}

int startIndex=htmlStyle.indexOf（“这个问题是一个典型的X-Y问题，在您确切告诉我们为什么需要在richtext文档中迭代段落之前是不完整的。您将如何处理这些段落？请注意“我将进一步解析它们”这不是一个好办法：你真的不想编写自己的HTML解析器。如果文本字符串中包含一些HTML，除非你对其进行解析，否则你无法处理它。将解析留给Qt，它无论如何都已经完成了。利用Qt提供给你访问的HTML解析器。你可以使用QdomDocument尝试xml阅读器。这个问题是一个典型的X-Y问题在您告诉我们您需要在richtext文档中迭代段落的确切原因之前，它是不完整的。您将如何处理这些段落？请注意“我将进一步解析它们”这不是一个好办法：你真的不想编写自己的HTML解析器。如果文本字符串中包含一些HTML，除非你对其进行解析，否则你无法处理它。将解析留给Qt，它无论如何都已经完成了。利用Qt提供给你访问的HTML解析器。你可以尝试使用QdomDocument的xml阅读器谢谢，我认为它会以某种方式工作！我只需要考虑一下。Qt中没有子字符串。我尝试使用“section（）”，它返回4个空字符串，这部分是好的，因为我当前的字符串中有4个匹配项。我只需要解决如何获取实际文本。现在它工作得很好！再次感谢！QStringRef子字符串（&htmlStyle，startIndex，endIndex startIndex）；
它从startIndex进入htmlStyle并在endIndex startIndex的长度之后停止！谢谢，我认为它会以某种方式工作！我只需要考虑一下。Qt中没有子字符串。我尝试使用“section（）”，它返回4个空字符串，这部分是好的，因为我的当前字符串中有4个匹配项。我只需要知道如何获取实际文本。现在它就像一个符咒！再次感谢！QStringRef子字符串（&htmlStyle，startIndex，endIndex startIndex）；
它从startIndex以htmlStyle的形式进入，并在endIndex startIndex的长度之后停止！