Regex vb.net正则表达式-替换a标记而不替换span标记_Regex_Vb.net

Regex vb.net正则表达式-替换a标记而不替换span标记

regex vb.net

Regex vb.net正则表达式-替换a标记而不替换span标记,regex,vb.net,Regex,Vb.net,如果从字符串中提取的数据具有url，则我的函数需要替换字符串中的标记。例如： <a href=www.cnn.com>www.cnn.com</a> 这很好，但当我有一个字符串时： <a href=www.cnn.com><span style="color: rgb(255, 0, 0);">www.cnn.com</span></a> 当我真的想留下来的时候： <span style="color: rgb(

如果从字符串中提取的数据具有url，则我的函数需要替换字符串中的标记。例如：

<a href=www.cnn.com>www.cnn.com</a>

这很好，但当我有一个字符串时：

<a href=www.cnn.com><span style="color: rgb(255, 0, 0);">www.cnn.com</span></a>

当我真的想留下来的时候：

<span style="color: rgb(255, 0, 0);">www.cnn.com</span>

www.cnn.com

我需要向代码中添加什么才能使其工作

这是我的职责：

Dim ret As String = text

'If it looks like a URL
Dim regURL As New Regex("(www|\.org\b|\.com\b|http)")
'Gets a Tags regex
Dim rxgATags = New Regex("<[^>]*>", RegexOptions.IgnoreCase) 

'Gets all matches of <a></a> and adds them to a list
Dim matches As MatchCollection = Regex.Matches(ret, "<a\b[^>]*>(.*?)</a>") 

'for each <a></a> in the text check it's content, if it looks like URL then delete the <a></a>
For Each m In matches
'tmpText holds the data extracted within the a tags. /visit at.../www.applyhere.com
        Dim tmpText = rxgATags.Replace(m.ToString, "")

        If regURL.IsMatch(tmpText) Then
            ret = ret.Replace(m.ToString, tmpText)
        End If
Next

Return ret

Dim ret As String=文本
'如果它看起来像一个URL
Dim regURL作为新的正则表达式（“www | \.org\b | \.com\b | http）”）
'获取标记正则表达式
Dim rxgATags=New Regex（“]*>”，RegexOptions.IgnoreCase）
'获取的所有匹配项并将其添加到列表中
将匹配项设置为MatchCollection=Regex.matches（ret，]*>（.*））
'对于文本中的每一个，检查其内容，如果它看起来像URL，则删除
对于匹配中的每个m
'tmpText保存在a标记中提取的数据/访问…/www.applyhere.com
尺寸tmpText=rxgATags.更换（m.ToString，“”）
如果规则IsMatch（tmpText），则
ret=ret.Replace（m.ToString，tmpText）
如果结束
下一个
回程网

以下正则表达式将删除所有HTML标记：

string someString = "<a href=www.one.co.il><span style=\"color: rgb(255, 0, 255);\">www.visitus.com</span></a>";

string target = System.Text.RegularExpressions.Regex.Replace(someString, @"<[^>]*>", "", RegexOptions.Compiled).ToString();

string someString=”“；
string target=System.Text.RegularExpressions.Regex.Replace（someString，@“]*>”，“”，RegexOptions.Compiled）.ToString（）；

这是您想要的正则表达式：

]*>

我的代码的结果：

www.visitus.com

您可以使用以下正则表达式-

”
'获取标记正则表达式
Dim rxgATags=新正则表达式（“（）”，RegexOptions.IgnoreCase）
Dim替换为字符串=“$2”
ret=rxgATags.更换（ret，更换）

我将此添加到我的代码中：

'Selects only the A tags without the data extracted between them
Dim rxgATagsOnly = New Regex("</?a\b[^>]*>", RegexOptions.IgnoreCase)

    For Each m In matches
        'tmpText holds the data extracted within the a tags. /visit at.../www.applyhere.com
        Dim tmpText = rxgATagsContent.Replace(m.ToString, "")

        'if the data extract between the tags looks like a URL then take off the a tags without touching the span tags.
        If regURL.IsMatch(tmpText) Then
            'select everything but a tags
            Dim noATagsStr As String = rxgATagsOnly.Replace(m.ToString, Environment.NewLine)
            'replaces string with a tag to non a tag string keeping it's span tags
            ret = ret.Replace(m.ToString, noATagsStr)

        End If
    Next

”仅选择A标记，而不选择它们之间提取的数据
Dim rxgATagsOnly=New Regex（“]*>”，RegexOptions.IgnoreCase）
对于匹配中的每个m
'tmpText保存在a标记中提取的数据。/visit at…/www.applyhere.com
Dim tmpText=rxgATagsContent.Replace（m.ToString，“”）
'如果标记之间的数据提取看起来像URL，则取下a标记而不接触span标记。
如果规则IsMatch（tmpText），则
'选择除标签外的所有内容
Dim noatagstr As String=rxgATagsOnly.Replace（m.ToString，Environment.NewLine）
'将带有标记的字符串替换为非标记字符串，保留其范围标记
ret=ret.Replace（m.ToString，noATagsStr）
如果结束
下一个

因此，从字符串：

<a href=www.cnn.com><span style="color: rgb(255, 0, 0);">www.cnn.com</span></a>

我只选择了带有Avinash Raj regex的a标签然后将其替换为“”。谢谢大家的回答。

使用此

@“]*>”

regex。

Dim ret As String = "<a href=www.one.co.il><span style=""color: rgb(255, 0, 255);"">www.visitus.com</span></a>"
'Gets a Tags regex
Dim rxgATags = New Regex("(<a\s*[^<>]*href=[""']?(?:www|\.org\b|\.com\b|http)[^<>]*>)((?>\s*<(?<t>[\w.-]+)[^<>]*?>[^<>]*?</\k<t>>\s*)+)(</a>)", RegexOptions.IgnoreCase)
Dim replacement As String = "$2"
ret = rxgATags.Replace(ret, replacement)

'Selects only the A tags without the data extracted between them
Dim rxgATagsOnly = New Regex("</?a\b[^>]*>", RegexOptions.IgnoreCase)

    For Each m In matches
        'tmpText holds the data extracted within the a tags. /visit at.../www.applyhere.com
        Dim tmpText = rxgATagsContent.Replace(m.ToString, "")

        'if the data extract between the tags looks like a URL then take off the a tags without touching the span tags.
        If regURL.IsMatch(tmpText) Then
            'select everything but a tags
            Dim noATagsStr As String = rxgATagsOnly.Replace(m.ToString, Environment.NewLine)
            'replaces string with a tag to non a tag string keeping it's span tags
            ret = ret.Replace(m.ToString, noATagsStr)

        End If
    Next

<a href=www.cnn.com><span style="color: rgb(255, 0, 0);">www.cnn.com</span></a>