Regex 用于在vb.net中的两个注释之间提取html的正则表达式代码不起作用
我试图在两条注释之间提取html的一部分 以下是测试代码:Regex 用于在vb.net中的两个注释之间提取html的正则表达式代码不起作用,regex,vb.net,html-parsing,Regex,Vb.net,Html Parsing,我试图在两条注释之间提取html的一部分 以下是测试代码: Sub Main() Dim base_dir As String = "D:\" Dim test_file As String = base_dir & "72.htm" Dim start_comment As String = "<!-- start of content -->" Dim end_comment As String = "<!-- end of co
Sub Main()
Dim base_dir As String = "D:\"
Dim test_file As String = base_dir & "72.htm"
Dim start_comment As String = "<!-- start of content -->"
Dim end_comment As String = "<!-- end of content -->"
Dim regex_pattern As String = start_comment & ".*" & end_comment
Dim input_text As String = start_comment & "some more html text" & end_comment
Dim match As Match = Regex.Match(input_text, regex_pattern)
If match.Success Then
Console.WriteLine("found {0}", match.Value)
Else
Console.WriteLine("not found")
End If
Console.ReadLine()
End Sub
Sub-Main()
Dim base\u dir As String=“D:\”
Dim test_文件格式为String=base_dir&“72.htm”
Dim start_注释为String=“”
Dim end_注释为String=“”
Dim regex_模式为String=start_comment&“*”&end_comment
Dim input_text As String=start_comment&“更多html文本”和end_comment
Dim match As match=Regex.match(输入\文本,正则表达式\模式)
如果匹配,那么成功
WriteLine(“找到{0}”,match.Value)
其他的
Console.WriteLine(“未找到”)
如果结束
Console.ReadLine()
端接头
以上工作
当我尝试从磁盘加载实际数据时,下面的代码失败
Sub Main()
Dim base_dir As String = "D:\"
Dim test_file As String = base_dir & "72.htm"
Dim start_comment As String = "<!-- start of content -->"
Dim end_comment As String = "<!-- end of content -->"
Dim regex_pattern As String = start_comment & ".*" & end_comment
Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "")
Dim match As Match = Regex.Match(input_text, regex_pattern)
If match.Success Then
Console.WriteLine("found {0}", match.Value)
Else
Console.WriteLine("not found")
End If
Console.ReadLine()
End Sub
Sub-Main()
Dim base\u dir As String=“D:\”
Dim test_文件格式为String=base_dir&“72.htm”
Dim start_注释为String=“”
Dim end_注释为String=“”
Dim regex_模式为String=start_comment&“*”&end_comment
Dim input_text As String=System.IO.File.ReadAllText(测试文件).Replace(vbCrLf,“”)
Dim match As match=Regex.match(输入\文本,正则表达式\模式)
如果匹配,那么成功
WriteLine(“找到{0}”,match.Value)
其他的
Console.WriteLine(“未找到”)
如果结束
Console.ReadLine()
端接头
HTML文件包含开始和结束注释,以及介于两者之间的大量HTML。
HTML文件中的某些内容是阿拉伯语
非常感谢和问候。我不知道
vb.net
,但是
是否匹配换行符,或者您是否必须为此设置选项?考虑使用<代码> [s\s] < /> >代替<代码> .<代码>以包含换行符。 尝试传入<代码> ReXEXOPTIONS。
Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline)
这将使点的
与换行符匹配