基于perl中的输入查找最长的重复字符串(使用子例程)
所以我试图找到给定特定模式的最长重复。到目前为止,我的代码看起来是这样的,并且非常接近,但是它没有完全给出想要的结果:基于perl中的输入查找最长的重复字符串(使用子例程),perl,while-loop,repeat,substr,subroutine,Perl,While Loop,Repeat,Substr,Subroutine,所以我试图找到给定特定模式的最长重复。到目前为止,我的代码看起来是这样的,并且非常接近,但是它没有完全给出想要的结果: use warnings; use strict; my $DNA; $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT" ; print "$DNA\n" ; print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n" ; p
use warnings;
use strict;
my $DNA;
$DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT" ;
print "$DNA\n" ;
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n" ;
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n" ;
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n" ;
sub longestRepeat{
my $someSequence = shift(@_); # shift off the first argument from the list
my $whatBP = shift(@_); # shift off the second argument from the list
my $match = 0;
if ($whatBP eq "AT"){
while ($someSequence =~ m/$whatBP/g) {
$match = $match + 1;
}
return $match;
}
if ($whatBP eq "TAGA"){
while ($someSequence =~ m/$whatBP/g) {
$match = $match + 1;
}
return $match;
}
if ($whatBP eq "C"){
while ($someSequence =~ m/$whatBP/g) {
$match = $match + 1;
}
return $match;
}
}
它现在所做的只是在序列中找到AT,TAGA,C的总量。它不是只给我最长的那个的长度,而是把它们加起来,给我总数。我认为while循环有问题,但是我不确定。任何帮助都将不胜感激
p、 它还应该以字符串形式显示最长的重复,而不是数字形式(这里可能使用substr) 您的
longestRepeat
函数不需要检查它正在处理三种情况中的哪一种——一般来说,当您发现自己多次编写了完全相同的指令时,这是一个提示,提示您可以排除重复,从而简化程序。考虑下面的内容,我已经清理了它们的功能,并为说明目的进行了注释:
#!/usr/bin/env perl
use warnings;
use strict;
# no need to declare and define separately; this works fine
# also no need for space before semicolon
my $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT";
print "$DNA\n";
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n";
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n";
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n";
sub longestRepeat {
# note that, within a function, @_ is the default argument to shift();
# hence its absence in the next two lines. (in practice, you're more
# likely to see 'shift' in this context without even parentheses, much
# less the full 'shift(@_)'; be prepared to run into it.)
my $sequence = shift(); # take the first argument
my $kmer = shift(); # take the second argument
# these state variables we'll use to keep track of what we're doing here;
# $longest_match, a string, will eventually be returned.
my $longest_matchlen = 0;
my $longest_match = '';
# for each match in $sequence of one or more $kmer repeats...
while ($sequence =~ m@($kmer)+@g) {
# ...get the length of the match, stored in $1 by the parenthesized
# capture group, with the '+' quantifier grabbing the longest match
# available from each starting point (see `man perlre' for more)...
my $this_matchlen = length($1);
# ...and if this match is longer than the longest yet found...
if ($this_matchlen > $longest_matchlen) {
# ...store this match's length in $longest_matchlen...
$longest_matchlen = $this_matchlen;
# ...and store the match itself in $longest_match.
$longest_match = $1;
}; # end of the 'if' statement
}; # end of the 'while' loop
# at this point, the longest match we found is in $longest_match; if
# we found no matches, then $longest_match still contains the empty
# string we assigned up there before the while loop started, which is
# the correct result in a case where $kmer never appears in $sequence.
return $longest_match;
};
你在学习生物信息学,是吗?我有一些向生物信息学家教授Perl的经验,我认为该领域的编程技能和人才分布极为广泛,图的左侧有一个非常不幸的凸起——这是一种礼貌的说法,作为一名专业程序员,我看到的大多数生物信息学Perl代码从不太好到非常差
我提到这一点并非有意侮辱,只是为了证实我的强烈建议,即在你目前正在学习的课程中加入一些计算机科学课程;你越能接触到算法精确公式中涉及的一般概念和思维习惯,你就越能应对你所在领域的要求——事实上,根据我的经验,你比大多数人准备得更充分;虽然我自己不是生物信息学家,但在与生物信息学家合作的过程中,在我看来,对生物信息学家来说,强大的编程背景可能比强大的生物学背景更有用。(这是从这个问题的a部分粘贴的)
根据子例程的名称,我假设您希望查找序列中最长的重复序列
如果是的话,那么以下内容如何:
sub longest_repeat {
my ( $sequence, $what ) = @_;
my @matches = $sequence =~ /((?:$what)+)/g ; # Store all matches
my $longest;
foreach my $match ( @matches ) { # Could also avoid temp variable :
# for my $match ( $sequence =~ /((?:$what)+)/g )
$longest //= $match ; # Initialize
# (could also do `$longest = $match
# unless defined $match`)
$longest = $match if length( $longest ) < length( $match );
}
return $longest; # Note this also handles the case of no matches
}
sub-longest\u重复{
我的($sequence,$what)=@;
my@matches=$sequence=~/(?:$what)+/g;#存储所有匹配项
我的美元最长;
foreach my$match(@matches){#也可以避免临时变量:
#对于我的$match($sequence=~/(?:$what)+)/g)
$longest/=$match;#初始化
#(也可以进行“$longest=$match
#除非定义了$match`)
$longest=$match如果长度($longest)
如果您能够理解这一点,那么以下版本将实现与Schwartzian变换基本相同的功能:
sub longest_repeat {
my ( $sequence, $what ) = @_; # Example:
# --------------------
my ( $longest ) = map { $_->[0] } # 'ATAT' ...
sort { $b->[1] <=> $a->[1] } # ['ATAT',4], ['AT',2]
map { [ $_, length($_) ] } # ['AT',2], ['ATAT',4]
$sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'
return $longest ;
}
sub-longest\u重复{
我的($sequence,$what)=@#示例:
# --------------------
我的($longest)=映射{$\->[0]}#'ATAT'。。。
排序{$b->[1]$a->[1]}['ATAT',4],'AT',2]
地图{[$\u,长度($\u)]}.['AT',2],'ATAT',4]
$sequence=~/((?:$what)+)/g;#…'AT','ATAT'
返回$longest;
}
有些人可能会争辩说,对
进行排序是浪费的,因为它是O(n.log(n))
而不是O(n)
,但是对ya来说有多种多样。如果语句在子例程中做什么?你不是在试图找到你传递的某个字符串的最大连续外观吗?ATATAT是1还是2?我喜欢你使用Schwartzian变换,但我是一个多毛的Perl黑客;这很可能会让生物信息学家大吃一惊。(我花了一些时间教该领域的人如何使用Perl。这是一个很难解决的问题。)请看我的答案,其中有一个例子更倾向于熟悉Perl和编程,这是你在该领域中经常发现的。@AaronMiller:我也不是计算机科学家/程序员。我的Perl技术大部分来自StackOverflow