perl中的多数投票?
我有5个包含相同单词的文件。我想阅读所有文件中的每个单词,并通过检测由制表符分隔的单词(*、#、$、&)中的以下字符来确定获胜的单词。然后,我想生成一个输出文件。我只能有两名优胜者。例如: 文件1 文件2perl中的多数投票?,perl,Perl,我有5个包含相同单词的文件。我想阅读所有文件中的每个单词,并通过检测由制表符分隔的单词(*、#、$、&)中的以下字符来确定获胜的单词。然后,我想生成一个输出文件。我只能有两名优胜者。例如: 文件1 文件2 we$ are# ... 文件3 文件4 文件5 输出文件: we$ are*# 我是这样开始的: #!/usr/local/bin/perl -w sub read_file_line { my
we$
are#
...
文件3
文件4
文件5
输出文件:
we$
are*#
我是这样开始的:
#!/usr/local/bin/perl -w
sub read_file_line {
my $fh = shift;
if ($fh and my $line = <$fh>) {
chomp($line);
return $line;
}
return;
}
open(my $f1, "words1.txt") or die "Can't";
open(my $f2, "words2.txt") or die "Can't";
open(my $f3, "words3.txt") or die "Can't";
open(my $f4, "words4.txt") or die "Can't";
open(my $f5, "words5.txt") or die "Can't";
my $r1 = read_file_line($f1);
my $r2 = read_file_line($f2);
my $r3 = read_file_line($f3);
my $r4 = read_file_line($f4);
my $r5 = read_file_line($f5);
while ($f5) {
#What can I do here to decide and write the winning word in the output file?
$r1 = read_file_line($f1);
$r2 = read_file_line($f2);
$r3 = read_file_line($f3);
$r4 = read_file_line($f4);
$r5 = read_file_line($f5);
}
#/usr/local/bin/perl-w
子读取文件行{
我的$fh=班次;
如果($fh和我的$line=){
chomp($line);
返回$line;
}
返回;
}
打开(我的$f1,“words1.txt”)或死“不能”;
打开(我的$f2,“words2.txt”)或死“不能”;
打开(我的$f3,“words3.txt”)或死“不能”;
打开(我的$f4,“words4.txt”)或死“不能”;
打开(我的$f5,“words5.txt”)或死“不能”;
my$r1=读取文件行($f1);
my$r2=读取文件行($f2);
my$r3=读取文件行($f3);
my$r4=读取文件行($f4);
my$r5=读取文件行($f5);
而(f5美元){
#我可以在这里做什么来决定并在输出文件中写入获奖单词?
$r1=读取文件行($f1);
$r2=读取文件行($f2);
$r3=读取文件行($f3);
$r4=读取文件行($f4);
$r5=读取文件行($f5);
}
听起来像是一个年轻人的工作。未测试代码:
use strict;
use warnings;
use 5.010;
use autodie;
use List::Util qw( sum reduce );
my %totals;
my @files = map "words$_.txt", 1..5;
for my $file (@files) {
open my $fh, '<', $file;
while (<$fh>) {
chomp;
my ($word, $sign) = /(\w+)(\W)/;
$totals{$word}{$sign}++;
}
}
open my $totals_fh, '>', 'outfile.txt';
my @sorted_words = sort { sum values %{$totals{$a}} <=> sum values %{$totals{$b}} } keys %totals; #Probably something fancier here.
for my $word (@sorted_words[0, 1]) {
#say {$totals_fh} $word, join('', keys %{$totals{$word}} ), "\t- ", function_to_decide_text($totals{$word});
say {$totals_fh} $word, reduce {
$totals{$word}{ substr $a, 0, 1 } == $totals{$word}{$b} ? $a . $b
: $totals{$word}{ substr $a, 0, 1 } > $totals{$word}{$b} ? $a
: $b;
} keys %{ $totals{$word} };
}
使用严格;
使用警告;
使用5.010;
使用自动模具;
使用列表::Util qw(总和减少);
我的百分比总数;
my@files=map“words$\ txt”,1..5;
对于我的$file(@files){
打开我的$fh,,'outfile.txt';
我的@sorted_words=sort{sum values%{$totals{$a}}sum values%{$totals{$b}}}键为%totals;#这里可能有更奇特的东西。
对于我的$word(@sorted_words[0,1]){
#说{$totals\u fh}$word,连接('',键%{$totals{$word}),“\t-”,函数决定文本($totals{$word});
说{$totals\u fh}$word,减少{
$totals{$word}{substr$a,0,1}==$totals{$word}{$b}?$a.$b
:$totals{$word}{substr$a,0,1}>$totals{$word}{$b}?$a
:$b;
}键%{$totals{$word}};
}
编辑:忘记了仅有的两个优胜者部分。有点修正了
EDIT2:根据注释修复。#!/usr/bin/perl
#!/usr/bin/perl
use strict;
use warnings;
my @files = qw(file1 file2 file3 file4 file5);
my $symbols = '*#$&'; # no need to escape them as they'll be in a character class
my %words;
foreach my $file (@files) {
open(my $fh, '<', $file) or die "Cannot open $file: $!";
while (<$fh>) {
if (/^(\w+[$symbols])$/) {
$words{$1} ++; # count the occurrences of each word
}
}
close $fh;
}
my $counter = 0;
my $previous = -1;
foreach my $word (sort {$words{$b} <=> $words{$a}} keys %words) {
# make sure you don't exit if two words at the top of the list
# have the same number of occurrences
if ($previous != $words{$word}) {
last if $counter > 1;
}
$counter ++; # count the output
$previous = $words{$word};
print "$word occurred $words{$word} times.\n";
}
严格使用;
使用警告;
my@files=qw(file1 file2 file3 file4 file5);
我的$symbols='*#$&'#不需要转义它们,因为它们将在字符类中
我的%字;
foreach my$文件(@files){
打开(我的$fh,'1;
}
$counter++#计算输出
$previous=$words{$word};
打印“$word出现$words{$word}次。\n”;
}
在我试用时工作…测试数据生成器
多数表决代码
对于生成的文件中的测试数据,这似乎是正确的
修订要求-输出示例 “修订要求”用制表符和其中一个字母“ABCD”替换了单词后的“*#$&”标记。经过快速协商,问题恢复为原始形式。此输出来自上述答案的适当修改版本-3个代码行发生了更改,2个在数据生成器中,1个在多数选民中。这些更改图中未显示s-它们是微不足道的
we C we D we C we C we D we C
are C are D are C are B are A are C
the B the D the A the A the D the A|D
people D people B people A people B people D people B|D
in D in B in C in B in D in D|B
charge C charge D charge D charge D charge A charge D
and A and B and C and C and B and B|C
what B what B what B what C what C what B
we D we B we D we B we A we B|D
say D say D say B say D say D say D
goes A goes C goes A goes C goes A goes A
修订的测试生成器-用于可配置的文件数
现在,海报已经解决了如何处理修改后的场景,这是我使用的数据生成器代码-带有5个标记(A-E)。显然,在命令行上配置标记的数量不会花费大量的工作
#!/usr/bin/env perl
use strict;
use warnings;
my $fmax = scalar(@ARGV) > 0 ? $ARGV[0] : 5;
my $tags = 'ABCDE';
my $ntags = length($tags);
my $fmt = sprintf "words$fmax-%%0%0dd.txt", length($fmax);
foreach my $fnum (1..$fmax)
{
my $file = sprintf $fmt, $fnum;
open my $fh, '>', $file or die "Failed to open $file for writing ($!)";
foreach my $w (qw(We Are The People In Charge And What We Say Goes))
{
my $suffix = substr($tags, rand($ntags), 1);
print $fh "$w\t$suffix\n";
}
}
修订的多数表决代码-适用于任意数量的文件
这段代码基本上可以处理任意数量的文件。正如(许多)注释中所述,它不会按照问题的要求检查每个文件中的单词是否相同;如果单词不相同,则可能会得到奇怪的结果
#!/usr/bin/env perl
use strict;
use warnings;
my @files = scalar @ARGV > 0 ? @ARGV :
( "words1.txt", "words2.txt", "words3.txt",
"words4.txt", "words5.txt"
);
my $voters = scalar(@files);
my @fh;
{
my $n = 0;
foreach my $file (@files)
{
open my $f, '<', $file or die "Can't open $file for reading ($!)";
$fh[$n++] = $f;
}
}
while (my $r = process_line(@fh))
{
print "$r\n";
}
sub process_line
{
my(@fhlist) = @_;
my %words = ();
foreach my $fh (@fhlist)
{
my $line = <$fh>;
return unless defined $line;
chomp $line;
$words{$line}++;
}
return winner(%words);
}
# Get tag X from entry "word\tX".
sub get_tag_from_word
{
my($word) = @_;
return (split /\s/, $word)[1];
}
sub winner
{
my(%words) = @_;
my $maxscore = 0;
my $winscore = ($voters / 2) + 1;
my $winner = '';
my $taglist = '';
foreach my $word (sort keys %words)
{
return "$word\t$words{$word}" if ($words{$word} >= $winscore);
if ($words{$word} > $maxscore)
{
$winner = $word;
$winner =~ s/\t.//;
$taglist = get_tag_from_word($word);
$maxscore = $words{$word};
}
elsif ($words{$word} == $maxscore)
{
my $newtag = get_tag_from_word($word);
$taglist .= "|$newtag";
}
}
return "$winner\t$taglist\t$maxscore";
}
第一列是单词;第二列是获胜的标记;第三列(数字)是最高分数;其余10列是10个数据文件中的标记。正如您所见,第一行中有两个“We A”、“We B”、“We E”。我还生成了(但未保留)一个结果集的最高分数是7。如果重复次数足够多,这些变化是可以找到的。气味,你知道那种气味。整个董事会。闻起来像……家庭作业。嘿-修订后的问题是前一个问题的一个小变化;你应该能够根据新的情况调整以前的任何解决方案。还有changi把问题弄得与前面的答案都不相关是不符合犹太教的。因此,当问题显然是家庭作业时,不把它标记为家庭作业也是不符合犹太教的。那么,不考虑你修改后的要求就完全是你的懒惰。我真的很抱歉,我改变了问题,让你明白第二种方法是什么,不让它变得不相关levant.这是我为了寻求你的帮助而让stack参与的一个项目。很抱歉。@aliocee:好的-你已经学会了。以后请记住。谢谢!@aliocee:。你不是在散列中存储行,只是每个单词的计数及其符号;数据结构看起来像{are=>{$'=>10',&=>1},我们=>{'$'=>1',#'=>11};所以散列很有可能远没有那么大。好吧,刚刚测试过它,它似乎运行得很好(用两个结果和所有结果生成输出文件)——不过,你必须自己定义函数来决定文本。如果你没有更改$word部分的值,那么错误应该在@sorted\u words[0,1]中-它没有被填满。请尝试在for之前添加use Data::Dumper;说Dumper\@sorted\u words;@aliocee,没错。刚刚编辑的版本符合您的要求,尽管我肯定我在这方面做得很糟糕。@aliocee:新的要求,嗯。要使这个工作正常,您需要将正则表达式更改为/(\p{ll})\t(\p{Lu}/,然后更改reduce中字符串连接的工作方式。不过,您必须自己解决这个问题。谢谢,但我只想创建与输入文件完全相同的文件,其中只包含获胜的单词。但是当我有两个获奖者时,我必须
#!/usr/bin/perl
use strict;
use warnings;
my @files = qw(file1 file2 file3 file4 file5);
my $symbols = '*#$&'; # no need to escape them as they'll be in a character class
my %words;
foreach my $file (@files) {
open(my $fh, '<', $file) or die "Cannot open $file: $!";
while (<$fh>) {
if (/^(\w+[$symbols])$/) {
$words{$1} ++; # count the occurrences of each word
}
}
close $fh;
}
my $counter = 0;
my $previous = -1;
foreach my $word (sort {$words{$b} <=> $words{$a}} keys %words) {
# make sure you don't exit if two words at the top of the list
# have the same number of occurrences
if ($previous != $words{$word}) {
last if $counter > 1;
}
$counter ++; # count the output
$previous = $words{$word};
print "$word occurred $words{$word} times.\n";
}
#!/usr/bin/env perl
use strict;
use warnings;
foreach my $i (1..5)
{
my $file = "words$i.txt";
open my $fh, '>', $file or die "Failed to open $file for writing ($!)";
foreach my $w (qw (we are the people in charge and what we say goes))
{
my $suffix = substr('*#$&', rand(4), 1);
print $fh "$w$suffix\n";
}
}
#!/usr/bin/env perl
use strict;
use warnings;
my @files = ( "words1.txt", "words2.txt", "words3.txt",
"words4.txt", "words5.txt"
);
my @fh;
{
my $n = 0;
foreach my $file (@files)
{
open my $f, '<', $file or die "Can't open $file for reading ($!)";
$fh[$n++] = $f;
}
}
while (my $r = process_line(@fh))
{
print "$r\n";
}
sub process_line
{
my(@fhlist) = @_;
my %words = ();
foreach my $fh (@fhlist)
{
my $line = <$fh>;
return unless defined $line;
chomp $line;
$words{$line}++;
}
my $combo = '';
foreach my $word (keys %words)
{
return $word if ($words{$word} > 2);
$combo .= $word if ($words{$word} == 2);
}
$combo =~ s/(\W)\w+(\W)/$1$2/;
return $combo;
}
$ perl datagenerator.pl
$ perl majorityvoter.pl > results.txt
$ paste words?.txt results.txt
we* we$ we& we# we# we#
are* are# are# are* are$ are*#
the* the& the# the# the& the&#
people& people& people$ people# people# people&#
in# in* in$ in* in* in*
charge* charge# charge& charge* charge# charge#*
and$ and* and$ and& and$ and$
what& what& what$ what& what# what&
we# we* we* we& we* we*
say$ say& say$ say$ say$ say$
goes$ goes& goes# goes# goes# goes#
$
we C we D we C we C we D we C
are C are D are C are B are A are C
the B the D the A the A the D the A|D
people D people B people A people B people D people B|D
in D in B in C in B in D in D|B
charge C charge D charge D charge D charge A charge D
and A and B and C and C and B and B|C
what B what B what B what C what C what B
we D we B we D we B we A we B|D
say D say D say B say D say D say D
goes A goes C goes A goes C goes A goes A
#!/usr/bin/env perl
use strict;
use warnings;
my $fmax = scalar(@ARGV) > 0 ? $ARGV[0] : 5;
my $tags = 'ABCDE';
my $ntags = length($tags);
my $fmt = sprintf "words$fmax-%%0%0dd.txt", length($fmax);
foreach my $fnum (1..$fmax)
{
my $file = sprintf $fmt, $fnum;
open my $fh, '>', $file or die "Failed to open $file for writing ($!)";
foreach my $w (qw(We Are The People In Charge And What We Say Goes))
{
my $suffix = substr($tags, rand($ntags), 1);
print $fh "$w\t$suffix\n";
}
}
#!/usr/bin/env perl
use strict;
use warnings;
my @files = scalar @ARGV > 0 ? @ARGV :
( "words1.txt", "words2.txt", "words3.txt",
"words4.txt", "words5.txt"
);
my $voters = scalar(@files);
my @fh;
{
my $n = 0;
foreach my $file (@files)
{
open my $f, '<', $file or die "Can't open $file for reading ($!)";
$fh[$n++] = $f;
}
}
while (my $r = process_line(@fh))
{
print "$r\n";
}
sub process_line
{
my(@fhlist) = @_;
my %words = ();
foreach my $fh (@fhlist)
{
my $line = <$fh>;
return unless defined $line;
chomp $line;
$words{$line}++;
}
return winner(%words);
}
# Get tag X from entry "word\tX".
sub get_tag_from_word
{
my($word) = @_;
return (split /\s/, $word)[1];
}
sub winner
{
my(%words) = @_;
my $maxscore = 0;
my $winscore = ($voters / 2) + 1;
my $winner = '';
my $taglist = '';
foreach my $word (sort keys %words)
{
return "$word\t$words{$word}" if ($words{$word} >= $winscore);
if ($words{$word} > $maxscore)
{
$winner = $word;
$winner =~ s/\t.//;
$taglist = get_tag_from_word($word);
$maxscore = $words{$word};
}
elsif ($words{$word} == $maxscore)
{
my $newtag = get_tag_from_word($word);
$taglist .= "|$newtag";
}
}
return "$winner\t$taglist\t$maxscore";
}
We A|B|C|D|E 2 B C C E D A D A E B
Are D 4 C D B A D B D D B E
The A 5 D A B B A A B E A A
People D 4 E D C D B E D D B C
In D 3 E C D D D B C A A B
Charge A|E 3 E E D A D A B A E B
And E 3 C E D D C A B E B E
What A 5 B C C A A A B A D A
We A 4 C A A E A E C D A E
Say A|D 4 A C A A D E D A D D
Goes A 3 D B A C C A A E E B