Regex 正则表达式解析的问题_Regex_Perl

Regex 正则表达式解析的问题

regex perl

Regex 正则表达式解析的问题,regex,perl,Regex,Perl,我正在尝试使用此页上的lynx命令编写程序”http://www.rottentomatoes.com/movie/box_office.php“而且我似乎不能把我的头围绕在某个问题上。。。。独自获得标题。我的问题是标题可以包含特殊字符、数字，而且所有标题的长度都是可变的。我想写一个正则表达式，可以解析整个页面并找到这样的行。。。。（我在标题和下一个数字之间添加了空格，即已发布的周数，以区分标题和发布的周数）我开始使用的正则表达式是： /(\d+)\s(\d+|\-\-)\s(\d+\%)\

我正在尝试使用此页上的lynx命令编写程序”http://www.rottentomatoes.com/movie/box_office.php“而且我似乎不能把我的头围绕在某个问题上。。。。独自获得标题。我的问题是标题可以包含特殊字符、数字，而且所有标题的长度都是可变的。我想写一个正则表达式，可以解析整个页面并找到这样的行。。。。（我在标题和下一个数字之间添加了空格，即已发布的周数，以区分标题和发布的周数）

我开始使用的正则表达式是：

/(\d+)\s(\d+|\-\-)\s(\d+\%)\s

如果有人能帮我弄清楚如何成功夺冠，我将不胜感激！感谢高级版。

捕获所有内容

^(\d+)\s+(\d+|\-\-)\s+(\d+\%)\s+(.*)\s+(\d+)\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\$\d+(?:.\d+)?[Mk])\s+(\d+)$

解释：

^                            <- Start of the line
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\d+|\-\-)\s+            <- Numbers [or "--"] (captured) followed by as many spaces as you want
    (\d+\%)\s+               <- Numbers [with '%'] (captured) followed by as many spaces as you want
    (.*)\s+                  <- Anything you can match [don't be greedy] (captured) followed by as many spaces as you want
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\d+)                    <- Numbers (captured)
$                            <- End of the line

^您的任务是分析页面，还是编写正则表达式来分析页面？如果是前者，则应该考虑使用DOM库而不是正则表达式。是否使用正则表达式来执行此操作？既然数据已经被证明是正确的，为什么不剪下适当的列，然后应用修剪函数呢？我完全同意你们两人的观点，但任务是使用lynx命令并解析所有信息=/这没有意义。您的问题显示了页面的文本表示，但该文本表示在任何地方都不存在。实际页面是HTML。在HTML中，数据被包装在一个表中，解决方案很简单——提取表的第四列。这肯定能抓取行中的所有内容！谢谢你的帮助！我做了大部分工作（作业），因为我认为在不给出正则表达式本身的情况下，解释正则表达式是不容易的。但我希望你能阅读解释，并阅读regular-expressions.info链接（不管怎样，这真的很有趣）。掌握这一点只是作业的一小部分。还有更多的正则表达式需要我去写，我只是在弄清楚如何获得整个标题时遇到了麻烦。这只是我的第二个正则表达式作业，所以我仍在努力解决它。
^                            <- Start of the line
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\d+|\-\-)\s+            <- Numbers [or "--"] (captured) followed by as many spaces as you want
    (\d+\%)\s+               <- Numbers [with '%'] (captured) followed by as many spaces as you want
    (.*)\s+                  <- Anything you can match [don't be greedy] (captured) followed by as many spaces as you want
    (\d+)\s+                 <- Numbers (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\$\d+(?:.\d+)?[Mk])\s+  <- "$" and Numbers [with floating point] and "M or k" (captured) followed by as many spaces as you want
    (\d+)                    <- Numbers (captured)
$                            <- End of the line