Regex 正则表达式解析的问题

Regex 正则表达式解析的问题,regex,perl,Regex,Perl,我正在尝试使用此页上的lynx命令编写程序”http://www.rottentomatoes.com/movie/box_office.php“而且我似乎不能把我的头围绕在某个问题上。。。。独自获得标题。我的问题是标题可以包含特殊字符、数字,而且所有标题的长度都是可变的。我想写一个正则表达式,可以解析整个页面并找到这样的行。。。。 (我在标题和下一个数字之间添加了空格,即已发布的周数,以区分标题和发布的周数) 我开始使用的正则表达式是: /(\d+)\s(\d+|\-\-)\s(\d+\%)\

我正在尝试使用此页上的lynx命令编写程序”http://www.rottentomatoes.com/movie/box_office.php“而且我似乎不能把我的头围绕在某个问题上。。。。独自获得标题。我的问题是标题可以包含特殊字符、数字,而且所有标题的长度都是可变的。我想写一个正则表达式,可以解析整个页面并找到这样的行。。。。 (我在标题和下一个数字之间添加了空格,即已发布的周数,以区分标题和发布的周数)

我开始使用的正则表达式是:

/(\d+)\s(\d+|\-\-)\s(\d+\%)\s

如果有人能帮我弄清楚如何成功夺冠,我将不胜感激!感谢高级版。

捕获所有内容

^(\d+)\s+(\d+|\-\-)\s+(\d+\%)\s+(.*)\s+(\d+)\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\d+)$
解释:

^                            <- Start of the line
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\d+|\-\-)\s+            <- Numbers [or "--"] (captured) followed by as many spaces as you want
    (\d+\%)\s+               <- Numbers [with '%'] (captured) followed by as many spaces as you want
    (.*)\s+                  <- Anything you can match [don't be greedy] (captured) followed by as many spaces as you want
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\d+)                    <- Numbers (captured)
$                            <- End of the line

^您的任务是分析页面,还是编写正则表达式来分析页面?如果是前者,则应该考虑使用DOM库而不是正则表达式。是否使用正则表达式来执行此操作?既然数据已经被证明是正确的,为什么不剪下适当的列,然后应用修剪函数呢?我完全同意你们两人的观点,但任务是使用lynx命令并解析所有信息=/这没有意义。您的问题显示了页面的文本表示,但该文本表示在任何地方都不存在。实际页面是HTML。在HTML中,数据被包装在一个表中,解决方案很简单——提取表的第四列。这肯定能抓取行中的所有内容!谢谢你的帮助!我做了大部分工作(作业),因为我认为在不给出正则表达式本身的情况下,解释正则表达式是不容易的。但我希望你能阅读解释,并阅读regular-expressions.info链接(不管怎样,这真的很有趣)。掌握这一点只是作业的一小部分。还有更多的正则表达式需要我去写,我只是在弄清楚如何获得整个标题时遇到了麻烦。这只是我的第二个正则表达式作业,所以我仍在努力解决它。
^                            <- Start of the line
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\d+|\-\-)\s+            <- Numbers [or "--"] (captured) followed by as many spaces as you want
    (\d+\%)\s+               <- Numbers [with '%'] (captured) followed by as many spaces as you want
    (.*)\s+                  <- Anything you can match [don't be greedy] (captured) followed by as many spaces as you want
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\d+)                    <- Numbers (captured)
$                            <- End of the line