Regex 如何编写正则表达式来获取XML标记中的文本？_Regex

Regex 如何编写正则表达式来获取XML标记中的文本？

regex

Regex 如何编写正则表达式来获取XML标记中的文本？,regex,Regex,我正在尝试编写一个正则表达式，它将返回一些XML标记中的文本。例如，如果我有一个这样格式的文件 <name>Joe Blog</name> <email>abc@sample.com</email> <address>123 sample st</address> 乔的博客 abc@sample.com 样本街123号如何提取地址字段的文本在此方面的任何帮助都将不胜感激。谢谢，此表达式将捕获地址值（.*）并将

我正在尝试编写一个正则表达式，它将返回一些XML标记中的文本。例如，如果我有一个这样格式的文件

<name>Joe Blog</name>
<email>abc@sample.com</email>
<address>123 sample st</address>

乔的博客 abc@sample.com 样本街123号如何提取地址字段的文本

在此方面的任何帮助都将不胜感激。

谢谢，

此表达式将捕获地址值

（.*）

并将其放入第一个捕获组

例子 示例文本

<name>Joe Blog</name>
<email>abc@sample.com</email>
<address>123 sample st</address>

你必须自己写还是可以使用tinyxml2

如果在没有SAX解析器的情况下使用tinyxml2，并且您知道该文档，请尝试以下操作：

/* ------ Example 2: Lookup information. ---- */    
{
    XMLDocument doc;
    doc.LoadFile( "dream.xml" );

    // Structure of the XML file:
    // - Element "PLAY"      the root Element, which is the 
    //                       FirstChildElement of the Document
    // - - Element "TITLE"   child of the root PLAY Element
    // - - - Text            child of the TITLE Element

    // Navigate to the title, using the convenience function,
    // with a dangerous lack of error checking.
    const char* title = doc.FirstChildElement( "PLAY" )->FirstChildElement( "TITLE" )->GetText();
    printf( "Name of play (1): %s\n", title );

    // Text is just another Node to TinyXML-2. The more
    // general way to get to the XMLText:
    XMLText* textNode = doc.FirstChildElement( "PLAY" )->FirstChildElement( "TITLE" )->FirstChild()->ToText();
    title = textNode->Value();
    printf( "Name of play (2): %s\n", title );
}

如果您想使用SAX解析器，tinyxml2也支持该模式。例如，代码，转到cocos2d-x，看看CCSAXParser类，它调用和子类tinyxml2来解析几乎所有的XML文件

资料来源：

选择您想要使用的语言。为什么不使用一个可以轻松解析xml代码的库呢？我建议您阅读这个关于同一个问题的伟大答案：等待有人粘贴指向XHTML答案的链接。。。（编辑：哦，等等，它来了。）@Tomalak对不起，但我不能再反对了。该问题是否需要XML处理程序？据我们所知，XML文件可能是本地环境中的文本文件，要求同样简单（和快速）。为什么有人需要一个完整的XML解析器来处理这个问题呢？（编辑：我的预期很明显，有人会粘贴一个指向该答案的链接，仅仅因为问题中有regex和xml/html这两个词。我的观点仍然是一样的：从他提问的方式来看，需求可能很简单，环境可能很简单。为什么要使用解析器？@DanFromGermany我确信他们是这样的。一个简单的XPath处理器就可以做到这一点（）。关键是，假设OP有一个非常复杂的系统，并且想要一个regex无法使用的超适应性解决方案是没有意义的。更糟糕的是，这个链接毫无帮助，它甚至无法解释为什么不应该使用正则表达式。你如何制作如此漂亮的正则表达式图形？我只需将表达式粘贴到其中，它就会显示图表

$dom = new DOMDocument();
$dom->loadHTML($your_html_here);
$addresses= $dom->getElementsByTagName('address');
foreach($addresses as $address) {
    $address = $address->innertext;
    // do something
}

/* ------ Example 2: Lookup information. ---- */    
{
    XMLDocument doc;
    doc.LoadFile( "dream.xml" );

    // Structure of the XML file:
    // - Element "PLAY"      the root Element, which is the 
    //                       FirstChildElement of the Document
    // - - Element "TITLE"   child of the root PLAY Element
    // - - - Text            child of the TITLE Element

    // Navigate to the title, using the convenience function,
    // with a dangerous lack of error checking.
    const char* title = doc.FirstChildElement( "PLAY" )->FirstChildElement( "TITLE" )->GetText();
    printf( "Name of play (1): %s\n", title );

    // Text is just another Node to TinyXML-2. The more
    // general way to get to the XMLText:
    XMLText* textNode = doc.FirstChildElement( "PLAY" )->FirstChildElement( "TITLE" )->FirstChild()->ToText();
    title = textNode->Value();
    printf( "Name of play (2): %s\n", title );
}