C#RegEx-查找html标记(div和anchor)
我必须检索几个div部分(特定类名为“row”)及其内容,并另外查找所有锚定标记(链接URL)(类为“underline red bold”)。 简短说明:获取部分:C#RegEx-查找html标记(div和anchor),c#,html,regex,find,tags,C#,Html,Regex,Find,Tags,我必须检索几个div部分(特定类名为“row”)及其内容,并另外查找所有锚定标记(链接URL)(类为“underline red bold”)。 简短说明:获取部分: <div class = "row "> ... (divs, tags ...) <a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=1
<div class = "row ">
... (divs, tags ...)
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
整个页面如下所示:
<html>
... 很多东西
<div class="row ">
<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f0607827.jpg">
</a>
</div>
<div class="desc">
<div class="l1">
<div class="icons">
</div>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
Culture And Gender <br>Intimate Relation</a>
</div>
<div class="fleft">
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">
<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>
<div class="omit"></div>
<div class="row ">
<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534899,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f06078222.jpg">
</a>
</div>
<div class="desc">
<div class="l1">
<div class="icons">
</div>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod5653489225,p">
Culture And Gender <br>Intimate Relation</a>
</div>
<div class="fleft">
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">
<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>
长描述
长描述
有人能帮我创建合适的reg-ex吗?正则表达式不太适合这种情况
由于HTML的嵌套特性,执行您所要求的操作的正则表达式将非常(非常)长且复杂。请改用HTML解析器。此问题的答案与此问题的答案大致相同: 查看 这是一个敏捷的HTML解析器 构建读/写DOM并支持 普通XPATH或XSLT(实际上 不必理解XPATH,也不必 XSLT来使用它,不用担心。它是 一个.NET代码库,允许您 解析“网外”HTML文件。这个 解析器对“real”非常宽容 “世界”格式错误的HTML。对象 模型与所提出的非常相似 Xml,但用于HTML文档(或 溪流)
有必要使用正则表达式吗?如果不是,那么为什么不使用像。。。如果您使用解析器而不是正则表达式,那么就更容易得到您想要的东西。或者,如果您已经成功进入LINQ,并且喜欢LINQ的强大功能,那么似乎可以下载。我还没有试过,所以我不能谈论它的功能。这篇文章用同一句话说regex和html:D brake yourself+1,Jens的意思是html Agility Pack,而对于任何C#html解析需求来说,没有其他任何东西:必选链接:这是我迄今为止投票率最高的答案,这有点令人遗憾。=)把你的胜利带到你能得到的地方,我的朋友…;-)
<div class="row ">
<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f0607827.jpg">
</a>
</div>
<div class="desc">
<div class="l1">
<div class="icons">
</div>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
Culture And Gender <br>Intimate Relation</a>
</div>
<div class="fleft">
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">
<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>
<div class="omit"></div>
<div class="row ">
<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534899,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f06078222.jpg">
</a>
</div>
<div class="desc">
<div class="l1">
<div class="icons">
</div>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod5653489225,p">
Culture And Gender <br>Intimate Relation</a>
</div>
<div class="fleft">
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">
<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>