Regex Perl:使用正则表达式的方式与散列相同
我希望从regex获得与下面的哈希相同的输出。 我知道我的正则表达式很难看,但我正在努力改进它 因此,正则表达式的预期输出是:Regex Perl:使用正则表达式的方式与散列相同,regex,perl,Regex,Perl,我希望从regex获得与下面的哈希相同的输出。 我知道我的正则表达式很难看,但我正在努力改进它 因此,正则表达式的预期输出是: 20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889 20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889 代码如下: #!/usr/bin/perl use strict; use warn
20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
代码如下:
#!/usr/bin/perl
use strict;
use warnings;
use POSIX qw(strftime);
my (%hash); # initialization
if (<DATA>) { # if DATA exists
print "here the regex values: \n";
while (<DATA>) { # open the DATA
chomp $_; # removes characters at the end of line
my @tab = split(/,/, $_); # split lines
my ($http, $ts, $macin, $caid) = (@tab[2, 3, 4, 5]);
my $timestamp = strftime '%Y%m%d%H%M%S', localtime($ts/1000); # from unix epoch time to human read-able date
my @value = split(/\//, $http); # split values of the http
my ($url, $filename) = ("http://$value[2]", $value[6]); # value in order to have url and the name of the file
if (! $hash{$timestamp."|".$caid."|".$macin."|".$filename."|".$url}) { # starting hash in order to avoid duplicates
$hash{$timestamp."|".$caid."|".$macin."|".$filename."|".$url} = $timestamp."|".$caid."|".$macin."|".$filename."|".$url;
}
my $regex = $_; # trying to have same output with a regex
$regex =~ s/(?:[^\/]*\/)([^\\*]*\/)([^\.*]*)([^\,*]*)(\,)([^\,*]*)(\,)(.*)(.*)/http:\/$1|$2|$3|$4|$5|$6|$7/;
print $regex, "\n";
}
}
if (%hash) { # checking if hahs exists and contains values
print "\nhere the hash values: \n";
foreach (sort keys %hash) {
print $_, "\n";
}
}
__DATA__
"@timestamp",url,ts,macin,caid
"Nov 22, 2019 @ 17:19:07.571","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/9836847.3018322401",1574443147021,40EVFVRFB,9836847
"Nov 22, 2019 @ 17:18:59.264","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/0292929.5002731501",1574443138223,BVFEFZZ9C4,0292929
here the regex values:
http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/|9836847|.3018322401"|,|1574443147021|,|40EVFVRFB,9836847
http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/|0292929|.5002731501"|,|1574443138223|,|BVFEFZZ9C4,0292929
here the hash values:
20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
1574443147021|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
1574443138223|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
"@timestamp",url,ts,macin,caid
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
这个正则表达式与您想要的匹配,替换项会给出预期的结果,除了时间戳之外,您必须像在代码的第一部分中那样转换它:
^.+?(http://[^/]+).+/([^/]+?)/[^/]+?,(.+?),(.+?),(.+)
更换:3美元5美元4美元2美元1美元
结果:
#!/usr/bin/perl
use strict;
use warnings;
use POSIX qw(strftime);
my (%hash); # initialization
if (<DATA>) { # if DATA exists
print "here the regex values: \n";
while (<DATA>) { # open the DATA
chomp $_; # removes characters at the end of line
my @tab = split(/,/, $_); # split lines
my ($http, $ts, $macin, $caid) = (@tab[2, 3, 4, 5]);
my $timestamp = strftime '%Y%m%d%H%M%S', localtime($ts/1000); # from unix epoch time to human read-able date
my @value = split(/\//, $http); # split values of the http
my ($url, $filename) = ("http://$value[2]", $value[6]); # value in order to have url and the name of the file
if (! $hash{$timestamp."|".$caid."|".$macin."|".$filename."|".$url}) { # starting hash in order to avoid duplicates
$hash{$timestamp."|".$caid."|".$macin."|".$filename."|".$url} = $timestamp."|".$caid."|".$macin."|".$filename."|".$url;
}
my $regex = $_; # trying to have same output with a regex
$regex =~ s/(?:[^\/]*\/)([^\\*]*\/)([^\.*]*)([^\,*]*)(\,)([^\,*]*)(\,)(.*)(.*)/http:\/$1|$2|$3|$4|$5|$6|$7/;
print $regex, "\n";
}
}
if (%hash) { # checking if hahs exists and contains values
print "\nhere the hash values: \n";
foreach (sort keys %hash) {
print $_, "\n";
}
}
__DATA__
"@timestamp",url,ts,macin,caid
"Nov 22, 2019 @ 17:19:07.571","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/9836847.3018322401",1574443147021,40EVFVRFB,9836847
"Nov 22, 2019 @ 17:18:59.264","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/0292929.5002731501",1574443138223,BVFEFZZ9C4,0292929
here the regex values:
http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/|9836847|.3018322401"|,|1574443147021|,|40EVFVRFB,9836847
http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/|0292929|.5002731501"|,|1574443138223|,|BVFEFZZ9C4,0292929
here the hash values:
20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
1574443147021|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
1574443138223|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
"@timestamp",url,ts,macin,caid
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
以下是perl代码:
use strict;
use warnings;
use POSIX qw(strftime);
while (<DATA>) {
chomp $_;
s~ # SUBSTITUTE
.+? # 1 or more any character but newline, not greedy
(http://[^/]+) # group 1, URL until the first slash
.+/ # 1 or more any character but newline until a slash
([^/]+?) # group 2, 1 or more non slash
/[^/]+?, # a slash, 1 or more non slash, a comma
(.+?) # group 3, 1 or more any character but newline, not greedy
, # a comma
(.+?) # group 4, 1 or more any character but newline, not greedy
, # a comma
(.+) # group 5, 1 or more any character but newline
~ # WITH
strftime('%Y%m%d%H%M%S', # convert time
localtime($3/1000))
. # CONCAT WITH
"|$5|$4|$2|$1" # groups 5, 4, 2, 1 joined with pipes
~ex; #
print $_, "\n";
}
__DATA__
"@timestamp",url,ts,macin,caid
"Nov 22, 2019 @ 17:19:07.571","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/9836847.3018322401",1574443147021,40EVFVRFB,9836847
"Nov 22, 2019 @ 17:18:59.264","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/0292929.5002731501",1574443138223,BVFEFZZ9C4,0292929
嗯,有很多方法可以达到同样的效果。Bellow是我的扩展版本,它不仅在字段周围乱洗,而且将它们分离成散列,并对它们进行一些操作[时间戳] 从最初的帖子来看,不清楚时间戳是应该从数据中获取还是在运行时生成——我从数据中获取了时间戳
use strict;
use warnings;
use feature qw(say);
use Data::Dumper;
my $debug = 0;
my %row;
my %url;
my @fields = qw( timestamp url ts macin caid );
my @address = qw( proto dn port dir id );
while( <DATA> ) {
next if /timestamp/;
print if $debug;
chomp;
s/,//;
s/"//g;
@row{@fields} = split ',';
print Dumper(\%row) if $debug;
@url{@address} = ( $row{url} =~ m#(\w+)://(.+):(\d+)/(.+)/(.+)# );
$url{id} =~ s/\.\d+//;
$url{dir} =~ /(\w+\.ts)/;
$url{ts} = $1;
print Dumper(\%url) if $debug;
say join('|', (
timestamp($row{timestamp}),
$url{id},
$row{macin},
$url{ts},
"$url{proto}://$url{dn}:$url{port}"
));
}
sub timestamp {
my $input = shift;
my %data;
my $result;
my %months = ( Jan => 1, Feb => 2, Mar => 3, Apr => 4,
May => 5, Jun => 6, Jul => 7, Aug => 8,
Sep => 9, Oct => 10, Nov => 11, Dec => 12
);
my @fields = qw( month day year hour min sec msec );
@data{@fields} = /(\w+)\s+(\d+)\s+(\d+)\s+@\s+(\d+):(\d+):(\d+).(\d+)/;
print Dumper(\%data) if $debug;
$result = sprintf "%4d%02d%02d%02d%02d",
$data{year},
$months{$data{month}},
$data{hour},
$data{min},
$data{sec};
return $result;
}
__DATA__
"@timestamp",url,ts,macin,caid
"Nov 22, 2019 @ 17:19:07.571","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/9836847.3018322401",1574443147021,40EVFVRFB,9836847
"Nov 22, 2019 @ 17:18:59.264","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/0292929.5002731501",1574443138223,BVFEFZZ9C4,0292929
正则表达式没有输出,匹配或替换都有。为什么要切换到替换?是的,正则表达式值来自数据。我想切换到替换以清理行,就像所有行都是相同的格式一样。数据是源,我已经将预期的输出放入哈希中。所以散列就是引用。那么,我想用同样的参考来代替