Perl 解析HTML日志文件并获取特定格式的文本文件_Perl

Perl 解析HTML日志文件并获取特定格式的文本文件

perl

Perl 解析HTML日志文件并获取特定格式的文本文件,perl,Perl,我想用Perl解析一个文本文件。此文本文件包含一些HTML文件的日志，如下所示： Details from /projects/git/Changelog.html file: NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4208">JIRA-4208</a><span style='mso-spacerun:yes'> </span>Add New Config C suppo

我想用Perl解析一个文本文件。此文本文件包含一些HTML文件的日志，如下所示：

Details from /projects/git/Changelog.html file:
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4208">JIRA-4208</a><span style='mso-spacerun:yes'>   </span>Add New Config C support in code
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-31">BUGJIRA-31</a><span style='mso-spacerun:yes'>   </span>Bugfix of some old bug
NEW_FEATURES: <a href="http://jira.xyz.com/browse/ZEERA-273">ZEERA-273</a><span style='mso-spacerun:yes'>   </span>Add support for some other feature.

Details from /projects/git/Changelog2.html file:
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-33">BUGJIRA-33</a><span style='mso-spacerun:yes'>   </span>Bugfix of an issue
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4209">JIRA-4209</a><span style='mso-spacerun:yes'>   </span>Add New Config D support in code

i、 e.所有错误编号及其描述

如果可能，我想将输出写入另一个文件

output.txt

编辑-1：

我的代码如下：

JIRA-4208, BUGJIRA-31, ZEERA-273, BUGJIRA-33, JIRA-4209 : Add New Config C support in code, Bugfix of some old bug, Add support for some other feature, Bugfix of an issue, Add New Config D support in code

#!/usr/bin/perl
open (FILE, 'input_file1.txt') or die "Could not read from file, exit...";
while(<FILE>)
{
  chomp;
  ($junk0,$junk1,$junk2,$junk3,$junk4,$BUG_NUMBR) = split /[:<="">]+/,$_;
  print "$BUG_NUMBR \n";
}
close FILE;
exit;

#!/usr/bin/perl

use strict;
use warnings;

open (FILE, 'perl_input_file1.txt') or die $!;
my ( @numbers, @text );
while (my $line = <FILE>) {
    chomp $line;
    $line =~ /^Details/ and next;
    my @stuff = split /[:<="">]+/, $line;
    push @numbers, $stuff[5];
    push @text, $stuff[-1];
}
close FILE;
print join ', ', @numbers;
print ': ';
print join ', ', @text;
print "\n";

这与如上所示的预期输出大不相同。我无法为预期输出的第二部分定义适当的正则表达式，这是对bug的简短描述。

您不需要正则表达式。您的

split

模式很有趣，但它完成了任务

把剩下的结果也算在内。我已经用数组替换了您的

$junk

变量。Perl允许您使用索引

-1

从右侧获取最后一个元素，因此将文本取出非常简单，因为它位于最后一个

之后

use strict;
use warnings;

my ( @numbers, @text );
while (my $line = <DATA>) {
    chomp $line;
    my @stuff = split /[:<="">]+/, $line;
    push @numbers, $stuff[5];
    push @text, $stuff[-1];
}

print join ', ', @numbers;
print ' : ';
print join ', ', @text;

__DATA__
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4208">JIRA-4208</a><span style='mso-spacerun:yes'>   </span>Add New Config C support in code
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-31">BUGJIRA-31</a><span style='mso-spacerun:yes'>   </span>Bugfix of some old bug
NEW_FEATURES: <a href="http://jira.xyz.com/browse/ZEERA-273">ZEERA-273</a><span style='mso-spacerun:yes'>   </span>Add support for some other feature.
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-33">BUGJIRA-33</a><span style='mso-spacerun:yes'>   </span>Bugfix of an issue
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4209">JIRA-4209</a><span style='mso-spacerun:yes'>   </span>Add New Config D support in code

使用严格；
使用警告；
我的（@number，@text）；
while（我的$line=）{
chomp$行；
我的@stuff=split/[：]+/，$line；
推送@numbers，$stuff[5]；
推送@text，$stuff[-1]；
}
打印连接“，”，@number；
打印“：”；
打印连接“，”，@text；
__资料__
新增功能：在代码中添加新的配置C支持
BUG_FIX：一些旧BUG的BUG修复
新功能：添加对某些其他功能的支持。
错误修复：问题的错误修复
新增功能：在代码中添加新的配置D支持

我还添加了strict和warnings，并使变量具有词法性

还请记住，如果文本包含文本

或

则代码将中断。上述问题说明的代码如下所示：
JIRA-4208, BUGJIRA-31, ZEERA-273, BUGJIRA-33, JIRA-4209 : Add New Config C support in code, Bugfix of some old bug, Add support for some other feature, Bugfix of an issue, Add New Config D support in code

#!/usr/bin/perl
open (FILE, 'input_file1.txt') or die "Could not read from file, exit...";
while(<FILE>)
{
  chomp;
  ($junk0,$junk1,$junk2,$junk3,$junk4,$BUG_NUMBR) = split /[:<="">]+/,$_;
  print "$BUG_NUMBR \n";
}
close FILE;
exit;

#!/usr/bin/perl

use strict;
use warnings;

open (FILE, 'perl_input_file1.txt') or die $!;
my ( @numbers, @text );
while (my $line = <FILE>) {
    chomp $line;
    $line =~ /^Details/ and next;
    my @stuff = split /[:<="">]+/, $line;
    push @numbers, $stuff[5];
    push @text, $stuff[-1];
}
close FILE;
print join ', ', @numbers;
print ': ';
print join ', ', @text;
print "\n";

这与我在问题中提到的期望输出相同
我要再次感谢@simbabque的指导和方法
您好，
您到底试过什么？你的代码中有什么不起作用？这里有什么问题？@ChrisDoyle：我已经添加了我的示例代码，并解释了它的局限性。请您提出一个解决方案。您真的想要一个所有bug编号的列表，然后是所有描述的列表吗？是的，预期的_输出是提交消息，我将最终提交repo中的更改。“git commit-m$EXPECTED_OUTPUT”感谢分享代码。它对我有效，但也有一些例外。您在示例中使用的数据集与我在上面的问题陈述中给出的数据集不同。对于我的数据集，有一些警告和额外的逗号（，）。然而，这对我来说是一个很好的起点。我将在修复问题时共享最终代码。再次感谢！！您好@simbabque，您提到我的split
模式很有趣，尽管它完成了任务。我同意这一点，因为这是我反复试验的结果。请您为拆分模式提供更好的建议。@Yash我不知道这是一个文件。很抱歉您做了正确的事情，只需检查^Details
是否在当前行中。我所说的有趣是指这种方法是非传统的。我可能会写一个模式来捕捉我想要的东西，但是你的方法也很有效。只要记住，如果输入发生变化，它就会中断。是的，我同意你的看法。因为我希望输入的textfile
每次都遵循相同的模式，所以当前的split
语法对我来说很有效。谢谢@simbabque，你的方法奏效了。附加代码是使用严格；我的$count=0；打开（在“input.txt”中）；而（）{if（/START/）{$count=1；}elsif（/END/）{$count=0；}elsif（$count）{print；}}关闭