Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html Ruby Regex在src中查找没有Youtube、Vimeo或Soundcloud的iFrame?_Html_Ruby_Regex - Fatal编程技术网

Html Ruby Regex在src中查找没有Youtube、Vimeo或Soundcloud的iFrame?

Html Ruby Regex在src中查找没有Youtube、Vimeo或Soundcloud的iFrame?,html,ruby,regex,Html,Ruby,Regex,我想编写一个正则表达式来忽略包含来自youtube、vimeo或soundcloud的URL的iFrame,这些URL是用HTML实体编码的字符串 这是我尝试过的,但不起作用。下面给出了一些示例文本 正则表达式 <iframe(^?youtube|soundcloud|vimeo)*\/iframe 示例文本 <p><iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width

我想编写一个正则表达式来忽略包含来自youtube、vimeo或soundcloud的URL的iFrame,这些URL是用HTML实体编码的字符串

这是我尝试过的,但不起作用。下面给出了一些示例文本

正则表达式

<iframe(^?youtube|soundcloud|vimeo)*\/iframe
示例文本

<p><iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe></p>
29  <p>text daily to place domain staff as volunteers with charity partners, we know all too well that the "V" word can sometimes be misunderstood. Occasionally seen as a dusty, worthy word, it can conjure images of coffee mornings and bric-a-brac stalls. So its not always as easy as you might think to get people to embrace their inner-volunteer. That's why the <a href="http://www.domain.co.uk/sdfn/2010/11/connect-create-domain-volunteers.shtml">Conne
<p><iframe src="http://www.youtube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe></p>
29  <p>text daily to place domain staff as volunteers with charity partners, we know all too well that the "V" word can sometimes be misunderstood. Occasionally seen as a dusty, worthy word, it can conjure images of coffee mornings and bric-a-brac stalls. So its not always as easy as you might think to get people to embrace their inner-volunteer. That's why the <a href="http://www.domain.co.uk/sdfn/2010/11/connect-create-domain-volunteers.shtml">Conne
样本输出

<iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe>
nil
示例文本

<p><iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe></p>
29  <p>text daily to place domain staff as volunteers with charity partners, we know all too well that the "V" word can sometimes be misunderstood. Occasionally seen as a dusty, worthy word, it can conjure images of coffee mornings and bric-a-brac stalls. So its not always as easy as you might think to get people to embrace their inner-volunteer. That's why the <a href="http://www.domain.co.uk/sdfn/2010/11/connect-create-domain-volunteers.shtml">Conne
<p><iframe src="http://www.youtube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe></p>
29  <p>text daily to place domain staff as volunteers with charity partners, we know all too well that the "V" word can sometimes be misunderstood. Occasionally seen as a dusty, worthy word, it can conjure images of coffee mornings and bric-a-brac stalls. So its not always as easy as you might think to get people to embrace their inner-volunteer. That's why the <a href="http://www.domain.co.uk/sdfn/2010/11/connect-create-domain-volunteers.shtml">Conne
样本输出

<iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe>
nil
我想说清楚:

我想忽略包含youtube、vimeo或soundcloud的iFrame

我在红细胞上测试它

您可以使用此正则表达式:

.*?iframe src=".*?(?:youtube|soundcloud|vimeo).*?".*|(.*?iframe src=".*?".*)

您可以看到,对于第一个输入(绿色输入),有您在问题中指定的输出。对于蓝色匹配,没有输出,因为它是youtube、soundcloud或vimeo的有效匹配

匹配信息

MATCH 1
1.  [0-155] `<p><iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe></p>`


这里的关键是
iframe.*?src=“(?=[^”]*(?:youtube | vimeo | soundcloud))
,所以让我为您展开:

iframe                          ?# literally match iframe
.*?                             ?# lazily match 0+ characters
src="                           ?# literally match src="
(?!                             ?# start negative lookahead assertion
  [^"]*                         ?# match 0+ non-" characters
  (?:youtube|vimeo|soundcloud)  ?# match one of the domains
)                               ?# end assertion

因此,一旦表达式到达
iframe
src
属性,它将在任何数量的非
字符之后对其中一个域进行否定断言(换句话说,直到
src
属性结束)。只要我们在属性中找不到这些域中的一个,我们就继续通过懒洋洋地匹配其余的
iframe
(直到结束标记)。

众所周知,除非您拥有HTML的生成,否则使用正则表达式解析HTML是很困难的,即使这样也很痛苦

相反,对于最简单的用途之外的任何东西,都可以使用解析器,它可以规范化许多导致模式失败的问题

提交的模式将失败,因为它们假定src参数使用标记名大小写、空格和字符串分隔符。这些可以在模式中使用,但更容易省事。在以下代码中,所有被检查的字符串都是有效的HTML:

require 'htmlentities'
require 'nokogiri'

[
  %#<p><iframe\nsrc="http://www.youtube.com/embed/YoX1yc92MOU_1"</iframe></p>#,
  %#<p><iframe\nsrc= "http://www.youtube.com/embed/YoX1yc92MOU_2"</iframe></p>#,
  %#<p><iframe\nsrc = "http://www.youtube.com/embed/YoX1yc92MOU_3"</iframe></p>#,
  %#<p><iframe\nsrc = 'http://www.youtube.com/embed/YoX1yc92MOU_4'</iframe></p>#,
  %#<p><Iframe\nsrc = 'http://www.youtube.com/embed/YoX1yc92MOU_5'</iframe></p>#,
  %#<p><IFRAME\nsrc = 'http://www.youtube.com/embed/YoX1yc92MOU_6'</iframe></p>#,
  %#<p><IFRAME\nsrc =
  'http://www.youtube.com/embed/YoX1yc92MOU_7'</iframe></p>#,
].each do |text|
  html = HTMLEntities::Decoder.new('html4').decode(text)
  doc = Nokogiri::HTML::DocumentFragment.parse(html)

  iframe = doc.at('iframe')
  puts "Ignoring: #{ iframe['src'] }" if iframe['src'][/\b(?:youtube|soundcloud|vimeo)\b/i]
end
# >> Ignoring: http://www.youtube.com/embed/YoX1yc92MOU_1
# >> Ignoring: http://www.youtube.com/embed/YoX1yc92MOU_2
# >> Ignoring: http://www.youtube.com/embed/YoX1yc92MOU_3
# >> Ignoring: http://www.youtube.com/embed/YoX1yc92MOU_4
# >> Ignoring: http://www.youtube.com/embed/YoX1yc92MOU_5
# >> Ignoring: http://www.youtube.com/embed/YoX1yc92MOU_6
# >> Ignoring: http://www.youtube.com/embed/YoX1yc92MOU_7
“当出现此类问题时,这是堆栈溢出上的一个强制性链接。最著名的答案当然是开玩笑,但它强调了不要对模式这样做


在上面的代码中,
/\b(?:youtube | soundcloud | vimeo)\b/i
是一个正则表达式,但它又短又甜,根本不应用于HTML。相反,它是针对src参数的内容使用的,该参数必须在(编码的)中正确HTML,不能被篡改/篡改,否则iframe本身将无法工作。

这不是正则表达式的好用法。HTML可能变化太大,模式无法处理。相反,将实体解码回HTML,然后使用解析器,如Nokogiri,它将规范化HTML,从而很容易忽略顺序、空格、c和apitalization等。我尝试了您提到的解决方案,但数据似乎不太一致。有几个断开的标记导致nokogiri无法正确解析HTML字符串。其中一个示例是这样的问题:@QambarRaza and..?这正是我想要的!只想让您知道解决方案是:/iframe.*?src=“(?![^]”*(?:youtube | vimeo | soundcloud)).\/iframe/m