如何使用Perl正则表达式替换HTML属性中的多个单词，每个单词散列为一个备用单词？_Html_Css_Perl_Obfuscation

如何使用Perl正则表达式替换HTML属性中的多个单词，每个单词散列为一个备用单词？

html css perl

如何使用Perl正则表达式替换HTML属性中的多个单词，每个单词散列为一个备用单词？,html,css,perl,obfuscation,Html,Css,Perl,Obfuscation,我正在编写一个HTML混淆器，我有一个散列，将用户友好的名称（ID和类）与混淆的名称（如a、b、c等）关联起来。我很难找到一个regexp来替换类似的东西 <div class="left tall"> s/(class|id)="(.*?)"/$1="$hash{$2}"/ 我应该如何更正这一点，以说明引号中的多个类名？更可取的是，解决方案应该与Perl兼容。我想我应该这样做： s/ (class|id)="([^"]+)" / $1 . '="' .

我正在编写一个HTML混淆器，我有一个散列，将用户友好的名称（ID和类）与混淆的名称（如a、b、c等）关联起来。我很难找到一个regexp来替换类似的东西

<div class="left tall">

s/(class|id)="(.*?)"/$1="$hash{$2}"/

我应该如何更正这一点，以说明引号中的多个类名？更可取的是，解决方案应该与Perl兼容。

我想我应该这样做：

s/  
    (class|id)="([^"]+)"
/   
    $1 . '="' . (
        join ' ', map { $hash{$_} } split m!\s+!, $2
    ) . '"'
/ex;

首先，你不应该使用正则表达式。您正试图用一个正则表达式做太多的事情（请参阅了解原因）。您需要的是一个HTML解析器。有关使用各种解析器的示例，请参见

看一看。下面是一个可能不完整的实现：

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Parser;

{
    my %map = (
        foo => "f",
        bar => "b",
    );

    sub start {
        my ($tag, $attr) = @_;
        my $attr_string = '';
        for my $key (keys %$attr) {
            if ($key eq 'class') {
                my @classes = split " ", $attr->{$key};
                #FIXME: this should be using //, but
                #it is only availble starting in 5.10
                #so I am using || which will do the
                #wrong thing if the class is 0, so
                #don't use a class of 0 in %map , m'kay
                $attr->{$key} = join " ", 
                    map { $map{$_} || $_ } @classes;
            }
            $attr_string .= qq/ $key="$attr->{$key}"/;
        }

        print "<$tag$attr_string>";
    }
}

sub text {
    print shift;
}

sub end {
    my $tag = shift;
    print "</$tag>";
}

my $p = HTML::Parser->new(
    start_h => [ \&start, "tagname,attr" ],
    text_h  => [ \&text, "dtext" ],
    end_h   => [ \&end, "tagname" ],
);

$p->parse_file(\*DATA);

__DATA__
<html>
    <head>
        <title>foo</title>
    </head>
    <body>
        <span class="foo">Foo!</span> <span class="bar">Bar!</span>
        <span class="foo bar">Foo Bar!</span>
        This should not be touched: class="foo"
    </body>
</html>

#/usr/bin/perl
严格使用；
使用警告；
使用HTML:：解析器；
{
我的%map=(
foo=>“f”，
条形图=>“b”，
);
次级起动{
我的（$tag，$attr）=@；
我的$attr_字符串=“”；
对于我的$key（key%$attr）{
如果（$key eq‘class’）{
my@classes=split”“$attr->{$key}；
#FIXME：应该使用//，但是
#仅从5.10开始可用
#所以我用| |来做
#如果类为0，则是错误的，因此
#不要在%map中使用0类，m'kay
$attr->{$key}=join“”，
map{$map{$\u}| |$\ u}@classes；
}
$attr_string.=qq/$key=“$attr->{$key}”/；
}
打印“”；
}
}
子文本{
打印移位；
}
副端{
我的$tag=shift；
打印“”；
}
my$p=HTML:：解析器->新建(
start\u h=>[\&start，“标记名，属性”]，
text\u h=>[\&text，“dtext”]，
结束\u h=>[\&end，“标记名”]，
);
$p->parse_文件（\*数据）；
__资料__
福
福！酒吧！
富吧！
不应触摸此项：class=“foo”

当HTML文本包含class=“foo”时，您会怎么做？单个正则表达式/替换不能与递归结构的数据很好地混合。有些人可能会说

left

和

tall

与

和

一样模糊。

#!/usr/bin/perl

use strict;
use warnings;

use HTML::Parser;

{
    my %map = (
        foo => "f",
        bar => "b",
    );

    sub start {
        my ($tag, $attr) = @_;
        my $attr_string = '';
        for my $key (keys %$attr) {
            if ($key eq 'class') {
                my @classes = split " ", $attr->{$key};
                #FIXME: this should be using //, but
                #it is only availble starting in 5.10
                #so I am using || which will do the
                #wrong thing if the class is 0, so
                #don't use a class of 0 in %map , m'kay
                $attr->{$key} = join " ", 
                    map { $map{$_} || $_ } @classes;
            }
            $attr_string .= qq/ $key="$attr->{$key}"/;
        }

        print "<$tag$attr_string>";
    }
}

sub text {
    print shift;
}

sub end {
    my $tag = shift;
    print "</$tag>";
}

my $p = HTML::Parser->new(
    start_h => [ \&start, "tagname,attr" ],
    text_h  => [ \&text, "dtext" ],
    end_h   => [ \&end, "tagname" ],
);

$p->parse_file(\*DATA);

__DATA__
<html>
    <head>
        <title>foo</title>
    </head>
    <body>
        <span class="foo">Foo!</span> <span class="bar">Bar!</span>
        <span class="foo bar">Foo Bar!</span>
        This should not be touched: class="foo"
    </body>
</html>