Perl HTML::查找标记直到某个标记
我需要阅读一个html文件并找到一个特定的段落标记,其中包含特定的文本。一旦我找到了那个标记,我就需要所有下一个标记的文本,直到我找到一个表标记为止 例如:Perl HTML::查找标记直到某个标记,perl,html-parsing,Perl,Html Parsing,我需要阅读一个html文件并找到一个特定的段落标记,其中包含特定的文本。一旦我找到了那个标记,我就需要所有下一个标记的文本,直到我找到一个表标记为止 例如: <asdf> </asdf> <p>THE SIGNAL TO GET INFO</p> <something>some good stuff in here</something> <p>something else</p> <u
<asdf>
</asdf>
<p>THE SIGNAL TO GET INFO</p>
<something>some good stuff in here</something>
<p>something else</p>
<ul>
<li>something good in here for sure</li>
<li>this too</li>
</ul>
<table> I DON'T WANT THIS </table>
我试过上面的代码。。。但它有严重的问题,因为它甚至无法编译
谢谢您的帮助。您可能想调用$tp->get_token,存储数据直到看到
[“S”,“table”…]
你说你不能让它工作。你能解释一下为什么/你看到了什么吗?也许可以为人们提供一个充分的例子
您没有提供示例输出,所以我做了一些假设
#!/usr/bin/perl
use HTML::TokeParser;
my $content = "<asdf>
</asdf>
<p>THE SIGNAL TO GET INFO</p>
<something>some good stuff in here</something>
<p>something else</p>
<ul>
<li>something good in here for sure</li>
<li>this too</li>
</ul>
<table> I DON'T WANT THIS </table>
";
my $description = "";
my $tp = HTML::TokeParser->new(\$content) || die "Can't open: $!";
while (my $token = $tp->get_tag("p")) {
my $paragraph = $tp->get_trimmed_text("/p");
if ($paragraph =~ /THE SIGNAL TO GET INFO/) {
while (my $toke = $tp->get_token)
{
last if ($toke->[1] eq "table");
# print "<$toke->[0]> <$toke->[1]> <$toke->[2]> <$toke->[3]> <$toke->[4]>\n";
# print " <".join("><",@{$toke->[3]}).">\n";
if ($toke->[0] eq "T" ) {
my $text = $toke->[1];
$description .= $text;
}
}
print $description;
last;
}
}
是的。壮观的东西。非常感谢你。
#!/usr/bin/perl
use HTML::TokeParser;
my $content = "<asdf>
</asdf>
<p>THE SIGNAL TO GET INFO</p>
<something>some good stuff in here</something>
<p>something else</p>
<ul>
<li>something good in here for sure</li>
<li>this too</li>
</ul>
<table> I DON'T WANT THIS </table>
";
my $description = "";
my $tp = HTML::TokeParser->new(\$content) || die "Can't open: $!";
while (my $token = $tp->get_tag("p")) {
my $paragraph = $tp->get_trimmed_text("/p");
if ($paragraph =~ /THE SIGNAL TO GET INFO/) {
while (my $toke = $tp->get_token)
{
last if ($toke->[1] eq "table");
# print "<$toke->[0]> <$toke->[1]> <$toke->[2]> <$toke->[3]> <$toke->[4]>\n";
# print " <".join("><",@{$toke->[3]}).">\n";
if ($toke->[0] eq "T" ) {
my $text = $toke->[1];
$description .= $text;
}
}
print $description;
last;
}
}
some good stuff in here
something else
something good in here for sure
this too