Regex 用于捕获周围文本的正则表达式<；部门>；标签？_Regex

Regex 用于捕获周围文本的正则表达式<；部门>；标签？

regex

Regex 用于捕获周围文本的正则表达式<；部门>；标签？,regex,Regex,我有一堆产品，我需要标题和描述文本，这样我就可以将它们放入一个类似以下内容的查询中：在uc_产品（标题、说明）中插入值（“Lafayette RK-820 4声道立体声磁带组”，“操作和维修手册，包括零件清单和示意图”）该信息当前收集在一些div标记中： <div class="radio-product-wrap"> <div class="radio-product-image"> </div> <div class=

我有一堆产品，我需要标题和描述文本，这样我就可以将它们放入一个类似以下内容的查询中：在uc_产品（标题、说明）中插入值（“Lafayette RK-820 4声道立体声磁带组”，“操作和维修手册，包括零件清单和示意图”）

该信息当前收集在一些div标记中：

<div class="radio-product-wrap">
    <div class="radio-product-image">

    </div>
    <div class="radio-product-title">
        <p>Lafayette RK-820 4 track stereo tape deck</p>
    </div>
    <div class="radio-product-desript">
        <p>Operation and service manual, includes parts list &amp;amp; schematic</p>
    </div>
    <div class="radio-cart-66-wrap">
        [add_to_cart item="L-1"]
    </div>
</div>


拉斐特RK-820 4声道立体声磁带组
操作和维修手册，包括零件清单和；amp；示意图
[添加到购物车项目=“L-1”]

如何编写正则表达式来获取信息？

这应该可以：

<div class="radio-product-title">.*?<p>(?<Title>.*?)</p>.*?</div>.*?<div class="radio-product-desript">.*?<p>(?<Description>.*?)</p>.*?</div>

*？（？*？）
*？*？*？（？*？）*？

您需要从匹配中捕获两个命名组

Title

和

Description

。

使用哪种目标语言？还是仅仅需要正则表达式本身

请注意，正则表达式仅在提取HTML文档中定义良好的部分时有用，不能用于解析HTML

如果您只需要一个正则表达式，可以使用：

<div\ class="radio-product-title">    # literal div tag with class

[^<]*                                 # any chars that are not '<'

<p>                                   # literal '<p>' tag

\s*                                   # optional leading spaces

([^<]+?)                              # one or more chars that are not '<', 
                                      #   captured in to group #1
                                      # (non-greedy)

\s*                                   # optional trailing spaces

<\/p>                                 # literal '</p>' tag

[^<]*                                 # any chars that are not '<'

<\/div>                               # literal '</div>' end tag

[^<]*                                 # any chars that are not '<'

<div\ class="radio-product-desript">  # literal div tag with class

[^<]*                                 # any chars that are not '<'

<p>                                   # literal '<p>' tag

\s*                                   # optional leading spaces

([^<]+?)                              # one or more chars that are not '<', 
                                      #   captured in to group #2
                                      # (non-greedy)

\s*                                   # optional trailing spaces

<\/p>                                 # literal '</p>' tag

#带有类的literal div标记
[StUrgRebug：使用正则表达式进行这种HTML解析是危险的。考虑使用一个Lite HTML解析器。第二个是什么，“AubHava说，从长远来看，您也会发现这种类型的东西更容易。如果您提到使用哪种语言，那么您可能会得到一些解析器使用的建议。顺便说一句，top dj。这个警告并不是对所有情况都适用。如果你想解析任意HTML文档，这是非常正确的。但是，正如我在回答中所指出的，从定义良好的HTML中提取可预测元素是正则表达式的一个极好的用例，它们的性能非常好。
<div\ class="radio-product-title">[^<]*<p>\s*([^<]+?)\s*<\/p>[^<]*<\/div>[^<]*<div\ class="radio-product-desript">[^<]*<p>\s*([^<]+?)\s*<\/p>