Regex 用于在vb.net中的两个注释之间提取html的正则表达式代码不起作用_Regex_Vb.net_Html Parsing

Regex 用于在vb.net中的两个注释之间提取html的正则表达式代码不起作用

regex vb.net

Regex 用于在vb.net中的两个注释之间提取html的正则表达式代码不起作用,regex,vb.net,html-parsing,Regex,Vb.net,Html Parsing,我试图在两条注释之间提取html的一部分以下是测试代码： Sub Main() Dim base_dir As String = "D:\" Dim test_file As String = base_dir & "72.htm" Dim start_comment As String = "" Dim end_comment As String = "<!-- end of co

我试图在两条注释之间提取html的一部分

以下是测试代码：

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = start_comment & "some more html text" & end_comment 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

Sub-Main（）
Dim base\u dir As String=“D:\”
Dim test_文件格式为String=base_dir&“72.htm”
Dim start_注释为String=“”
Dim end_注释为String=“”
Dim regex_模式为String=start_comment&“*”&end_comment
Dim input_text As String=start_comment&“更多html文本”和end_comment
Dim match As match=Regex.match（输入\文本，正则表达式\模式）
如果匹配，那么成功
WriteLine（“找到{0}”，match.Value）
其他的
Console.WriteLine（“未找到”）
如果结束
Console.ReadLine（）
端接头

以上工作

当我尝试从磁盘加载实际数据时，下面的代码失败

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "") 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

Sub-Main（）
Dim base\u dir As String=“D:\”
Dim test_文件格式为String=base_dir&“72.htm”
Dim start_注释为String=“”
Dim end_注释为String=“”
Dim regex_模式为String=start_comment&“*”&end_comment
Dim input_text As String=System.IO.File.ReadAllText（测试文件）.Replace（vbCrLf，“”）
Dim match As match=Regex.match（输入\文本，正则表达式\模式）
如果匹配，那么成功
WriteLine（“找到{0}”，match.Value）
其他的
Console.WriteLine（“未找到”）
如果结束
Console.ReadLine（）
端接头

HTML文件包含开始和结束注释，以及介于两者之间的大量HTML。 HTML文件中的某些内容是阿拉伯语

非常感谢和问候。

我不知道

vb.net

，但是

是否匹配换行符，或者您是否必须为此设置选项？考虑使用<代码> [s\s] < /> >代替<代码> .<代码>以包含换行符。

尝试传入<代码> ReXEXOPTIONS。

Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline)

这将使点的

与换行符匹配