Perl HTML::查找标记直到某个标记

Perl HTML::查找标记直到某个标记,perl,html-parsing,Perl,Html Parsing,我需要阅读一个html文件并找到一个特定的段落标记,其中包含特定的文本。一旦我找到了那个标记,我就需要所有下一个标记的文本,直到我找到一个表标记为止 例如: <asdf> </asdf> <p>THE SIGNAL TO GET INFO</p> <something>some good stuff in here</something> <p>something else</p> <u

我需要阅读一个html文件并找到一个特定的段落标记,其中包含特定的文本。一旦我找到了那个标记,我就需要所有下一个标记的文本,直到我找到一个表标记为止

例如:

<asdf>
</asdf>
<p>THE SIGNAL TO GET INFO</p>
    <something>some good stuff in here</something>
<p>something else</p>
<ul>
    <li>something good in here for sure</li>
    <li>this too</li>
</ul>
<table> I DON'T WANT THIS </table>
我试过上面的代码。。。但它有严重的问题,因为它甚至无法编译


谢谢您的帮助。

您可能想调用$tp->get_token,存储数据直到看到
[“S”,“table”…]

你说你不能让它工作。你能解释一下为什么/你看到了什么吗?也许可以为人们提供一个充分的例子

您没有提供示例输出,所以我做了一些假设

#!/usr/bin/perl
use HTML::TokeParser;

my $content = "<asdf>
</asdf>
<p>THE SIGNAL TO GET INFO</p>
    <something>some good stuff in here</something>
<p>something else</p>
<ul>
    <li>something good in here for sure</li>
    <li>this too</li>
</ul>
<table> I DON'T WANT THIS </table>
";

my $description = "";
my $tp = HTML::TokeParser->new(\$content) || die "Can't open: $!";

while (my $token = $tp->get_tag("p")) {
    my $paragraph = $tp->get_trimmed_text("/p");
    if ($paragraph =~ /THE SIGNAL TO GET INFO/) {
      while (my $toke = $tp->get_token)
      {
        last if ($toke->[1] eq "table");
#       print "<$toke->[0]> <$toke->[1]> <$toke->[2]> <$toke->[3]> <$toke->[4]>\n";
#       print " <".join("><",@{$toke->[3]}).">\n";
        if ($toke->[0] eq "T" ) {
                my $text = $toke->[1];
                $description .= $text;
        }
      }
      print $description;
      last;
    }
}

是的。壮观的东西。非常感谢你。
#!/usr/bin/perl
use HTML::TokeParser;

my $content = "<asdf>
</asdf>
<p>THE SIGNAL TO GET INFO</p>
    <something>some good stuff in here</something>
<p>something else</p>
<ul>
    <li>something good in here for sure</li>
    <li>this too</li>
</ul>
<table> I DON'T WANT THIS </table>
";

my $description = "";
my $tp = HTML::TokeParser->new(\$content) || die "Can't open: $!";

while (my $token = $tp->get_tag("p")) {
    my $paragraph = $tp->get_trimmed_text("/p");
    if ($paragraph =~ /THE SIGNAL TO GET INFO/) {
      while (my $toke = $tp->get_token)
      {
        last if ($toke->[1] eq "table");
#       print "<$toke->[0]> <$toke->[1]> <$toke->[2]> <$toke->[3]> <$toke->[4]>\n";
#       print " <".join("><",@{$toke->[3]}).">\n";
        if ($toke->[0] eq "T" ) {
                my $text = $toke->[1];
                $description .= $text;
        }
      }
      print $description;
      last;
    }
}
    some good stuff in here
something else

    something good in here for sure
    this too