如何在HTML文件中读取表的值并将其存储在Perl中？_Html_Perl

如何在HTML文件中读取表的值并将其存储在Perl中？

html perl

如何在HTML文件中读取表的值并将其存储在Perl中？,html,perl,Html,Perl,我读了很多问题和答案，但我找不到一个直接的答案。所有的答案要么非常笼统，要么与我想做的不同。到目前为止，我需要使用HTML:：TableExtract或HTML:：TreeBuilder:：XPath，但我无法真正使用它们来存储值。我可以以某种方式获得表行值，并用Dumper显示它们大概是这样的： foreach my $ts ($tree->table_states) { foreach my $row ($ts->rows) { push (@fir , (Dumpe

我读了很多问题和答案，但我找不到一个直接的答案。所有的答案要么非常笼统，要么与我想做的不同。到目前为止，我需要使用HTML:：TableExtract或HTML:：TreeBuilder:：XPath，但我无法真正使用它们来存储值。我可以以某种方式获得表行值，并用Dumper显示它们

大概是这样的：

foreach my $ts ($tree->table_states) {
 foreach my $row ($ts->rows) { 
   push (@fir , (Dumper $row)); 
} }
print @sec;

但这并不是我想要的。我将添加要存储值的HTML表的结构：

<table><caption><b>Table 1 </b>bla bla bla</caption>
<tbody>
    <tr>
        <th ><p>Foo</p>
        </th>

        <td ><p>Bar</p>
        </td>

    </tr>

    <tr>
        <th ><p>Foo-1</p>
        </th>

        <td ><p>Bar-1</p>
        </td>

    </tr>

    <tr>
        <th ><p>Formula</p>
        </th>

        <td><p>Formula1-1</p>
            <p>Formula1-2</p>
            <p>Formula1-3</p>
            <p>Formula1-4</p>
            <p>Formula1-5</p>
        </td>

    </tr>

    <tr>
        <th><p>Foo-2</p>
        </th>

        <td ><p>Bar-2</p>
        </td>

    </tr>

    <tr>
        <th ><p>Foo-3</p>
        </th>

        <td ><p>Bar-3</p>
             <p>Bar-3-1</p>
        </td>

    </tr>

</tbody>

</table>

表1 bla bla bla bla
福
酒吧
Foo-1
Bar-1
公式
公式1-1
公式1-2
公式1-3
公式1-4
公式1-5
Foo-2
酒吧-2
Foo-3
Bar-3
巴-3-1

如果我可以将行值成对存储在一起，那就方便了

预期输出类似于一个数组，其值为：（Foo，Bar，Foo-1，Bar-1，公式，公式1公式2公式3公式4公式5，…）

对我来说，重要的是学习如何存储每个标记的值以及如何在标记树中移动。

学习XPath和DOM操作

use strictures;
use HTML::TreeBuilder::XPath qw();
my $dom = HTML::TreeBuilder::XPath->new;
$dom->parse_file('10280979.html');

my %extract;
@extract{$dom->findnodes_as_strings('//th')} =
    map {[$_->findvalues('p')]} $dom->findnodes('//td');
__END__
# %extract = (
#     Foo     => [qw(Bar)],
#     'Foo-1' => [qw(Bar-1)],
#     'Foo-2' => [qw(Bar-2)],
#     'Foo-3' => [qw(Bar-3 Bar-3-1)],
#     Formula => [qw(Formula1-1 Formula1-2 Formula1-3 Formula1-4 Formula1-5)],
# )

你能用预期的结果编辑你的问题吗？