C# 使用正则表达式搜索在Rss中解析和格式化HTML标记_C#_Regex_Windows Phone 7_Rss

C# 使用正则表达式搜索在Rss中解析和格式化HTML标记

c# regex windows-phone-7 rss

C# 使用正则表达式搜索在Rss中解析和格式化HTML标记,c#,regex,windows-phone-7,rss,C#,Regex,Windows Phone 7,Rss,我有内容：rss中的编码文本，如下所示： <content:encoded><![CDATA[Wednesday, September 26, 2012It is Apple.Shops are closed. Parking is not allowed here. Go left and park. All theatres are op

我有内容：rss中的编码文本，如下所示：

<content:encoded><![CDATA[<P><B>Wednesday, September 26, 2012</B></P>It is Apple.<P>Shops are closed.<br />Parking is not allowed here. Go left and park.<br />All theatres are opened.<br /></P><P><B>Thursday, September 27, 2012</B></P><P>Shops are open.<br />Parking is not allowed here. Go left and park.<br  />All theatres are opened.<br /></P>]]></content:encoded>

使用以下方法，我能够从HTML中提取文本：

public static string StripHTML(this string htmlText)
    {
        var reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
        return HttpUtility.HtmlDecode(reg.Replace(htmlText, string.Empty));
    }

但我希望将其中的文本插入dateArray[]中，并将其中的文本插入descriptionArray[]中，以便显示如下所示：

提前谢谢。

c。。。例如，您有一些优秀的html解析器agilitypack。是关于正则表达式在堆栈溢出中解析html的说法。此错误显示未引用的程序集中定义了类型“System.Xml.XPath.IXPathNavigable”。您必须添加对程序集'System.Xml.XPath，Version=2.0.5.0，Culture=neutral，PublicKeyToken=31bf3856ad364e35'的引用，我知道了。我必须从SDK文件夹添加对System.Xml.XPath的引用。

//http://htmlagilitypack.codeplex.com/
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var result = doc.DocumentNode.Descendants()
                .Where(n => n is HtmlAgilityPack.HtmlTextNode)
                .Select(n=>new {
                    IsDate = n.ParentNode.Name=="b" ? true: false,
                    Text = n.InnerText,
                })
                .ToList();