Regex Coldfusion-简单HTML解析_Regex_Coldfusion

Regex Coldfusion-简单HTML解析

regex coldfusion

Regex Coldfusion-简单HTML解析,regex,coldfusion,Regex,Coldfusion,我们目前有一些文章发布到我们的网站上。它们可以与以下类型的html一起出现 this is an article <img src="someimage"> this is an article <img src="someimage"> this is an article <img src="somei

我们目前有一些文章发布到我们的网站上。它们可以与以下类型的html一起出现

<p>this is an article<br>
<img src="someimage">
</p>

<p>this is an article<br>
<img src="someimage">
</p>

<p>this is an article<br>
<img src="someimage">
</p>

<p>this is an article<br>
<img src="someimage">
</p>

这是一篇文章


这是一篇文章


这是一篇文章


这是一篇文章

或


这是一篇文章


这是一篇文章



这是一篇文章

一些其他的html标签可能在这里面，有时，我不知道如何使用coldfusion来实现这一点

本质上，我需要做的是抓住第一段文字和图像，并能够安排它

使用Coldfusion 8是否可能做到这一点？有人能告诉我如何学习这个吗？

100%绝对可能

现在，不要被我的建议所拖累，事实上这很容易开始

下载一个名为jSoup的库…它的唯一用途是从网页中的DOM中删除内容：

然后，您可以通过执行以下操作来使用该Java类：

<!--- Get the page. --->
<cfhttp method="get" url="http://example.com/" resolveurl="true" useragent="#cgi.http_user_agent#" result="myPage" timeout="10" charset="utf-8">
<cfhttpparam type="header" name="Accept-Encoding" value="*" />   
<cfhttpparam type="header" name="TE" value="deflate;q=0" />        
</cfhttp>

<!--- Load up jSoup and parse the document with it. --->
<cfset jsoup = createObject("java", "org.jsoup.Jsoup") />
<cfset document = jsoup.parse(myPage.filecontent) />

<!--- Search the parsed document for the contents of the TITLE tag. --->
<cfset title = document.select("title").first() />

<!--- Let's see what we got. --->
<cfdump var="#title#" />

这个例子非常简单，但它可以告诉您使用它是多么容易。如果您在jSoup上查看文档，那么抓取图像和其他任何东西都会相当容易

本页中有一些很好的示例，您可以使用CSS样式选择器：

尽量避免使用正则表达式来完成这项任务——相信我，我已经尝试过了，这绝对是个麻烦

希望这有帮助。

Mikey.

去cflib.org旅行是值得的。有一个名为safetext的函数，它对潜在的恶意标记进行编码，同时保留良性标记，如这对我来说是一个超级快速和简单的解决方案。谢谢

<!--- Get the page. --->
<cfhttp method="get" url="http://example.com/" resolveurl="true" useragent="#cgi.http_user_agent#" result="myPage" timeout="10" charset="utf-8">
<cfhttpparam type="header" name="Accept-Encoding" value="*" />   
<cfhttpparam type="header" name="TE" value="deflate;q=0" />        
</cfhttp>

<!--- Load up jSoup and parse the document with it. --->
<cfset jsoup = createObject("java", "org.jsoup.Jsoup") />
<cfset document = jsoup.parse(myPage.filecontent) />

<!--- Search the parsed document for the contents of the TITLE tag. --->
<cfset title = document.select("title").first() />

<!--- Let's see what we got. --->
<cfdump var="#title#" />