基于perl中的输入查找最长的重复字符串（使用子例程）_Perl_While Loop_Repeat_Substr_Subroutine

基于perl中的输入查找最长的重复字符串（使用子例程）

perl

基于perl中的输入查找最长的重复字符串（使用子例程）,perl,while-loop,repeat,substr,subroutine,Perl,While Loop,Repeat,Substr,Subroutine,所以我试图找到给定特定模式的最长重复。到目前为止，我的代码看起来是这样的，并且非常接近，但是它没有完全给出想要的结果： use warnings; use strict; my $DNA; $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT" ; print "$DNA\n" ; print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n" ; p

所以我试图找到给定特定模式的最长重复。到目前为止，我的代码看起来是这样的，并且非常接近，但是它没有完全给出想要的结果：

use warnings;
use strict;    

my $DNA;       
$DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT" ;
print "$DNA\n" ;
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n" ;
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n" ;
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n" ;

sub longestRepeat{

  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;



        if ($whatBP eq "AT"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;

        }
        if ($whatBP eq "TAGA"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;
        }

        if ($whatBP eq "C"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;
        }
}

它现在所做的只是在序列中找到AT，TAGA，C的总量。它不是只给我最长的那个的长度，而是把它们加起来，给我总数。我认为while循环有问题，但是我不确定。任何帮助都将不胜感激

p、它还应该以字符串形式显示最长的重复，而不是数字形式（这里可能使用substr）

您的

longestRepeat

函数不需要检查它正在处理三种情况中的哪一种——一般来说，当您发现自己多次编写了完全相同的指令时，这是一个提示，提示您可以排除重复，从而简化程序。考虑下面的内容，我已经清理了它们的功能，并为说明目的进行了注释：

#!/usr/bin/env perl
use warnings;
use strict;    

# no need to declare and define separately; this works fine
# also no need for space before semicolon
my $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT";
print "$DNA\n";
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n";
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n";
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n";

sub longestRepeat {

  # note that, within a function, @_ is the default argument to shift();
  # hence its absence in the next two lines. (in practice, you're more 
  # likely to see 'shift' in this context without even parentheses, much
  # less the full 'shift(@_)'; be prepared to run into it.)
  my $sequence = shift(); # take the first argument
  my $kmer = shift(); # take the second argument

  # these state variables we'll use to keep track of what we're doing here;
  # $longest_match, a string, will eventually be returned.
  my $longest_matchlen = 0;
  my $longest_match = '';

  # for each match in $sequence of one or more $kmer repeats...
  while ($sequence =~ m@($kmer)+@g) {

    # ...get the length of the match, stored in $1 by the parenthesized
    # capture group, with the '+' quantifier grabbing the longest match 
    # available from each starting point (see `man perlre' for more)...
    my $this_matchlen = length($1);

    # ...and if this match is longer than the longest yet found...
    if ($this_matchlen > $longest_matchlen) {

      # ...store this match's length in $longest_matchlen...
      $longest_matchlen = $this_matchlen;

      # ...and store the match itself in $longest_match.
      $longest_match = $1;

    }; # end of the 'if' statement

  }; # end of the 'while' loop

  # at this point, the longest match we found is in $longest_match; if
  # we found no matches, then $longest_match still contains the empty
  # string we assigned up there before the while loop started, which is
  # the correct result in a case where $kmer never appears in $sequence.
  return $longest_match;
};

你在学习生物信息学，是吗？我有一些向生物信息学家教授Perl的经验，我认为该领域的编程技能和人才分布极为广泛，图的左侧有一个非常不幸的凸起——这是一种礼貌的说法，作为一名专业程序员，我看到的大多数生物信息学Perl代码从不太好到非常差

我提到这一点并非有意侮辱，只是为了证实我的强烈建议，即在你目前正在学习的课程中加入一些计算机科学课程；你越能接触到算法精确公式中涉及的一般概念和思维习惯，你就越能应对你所在领域的要求——事实上，根据我的经验，你比大多数人准备得更充分；虽然我自己不是生物信息学家，但在与生物信息学家合作的过程中，在我看来，对生物信息学家来说，强大的编程背景可能比强大的生物学背景更有用。

（这是从这个问题的a部分粘贴的）

根据子例程的名称，我假设您希望查找序列中最长的重复序列

如果是的话，那么以下内容如何：

sub longest_repeat {

    my ( $sequence, $what ) = @_;

    my @matches = $sequence =~ /((?:$what)+)/g ;  # Store all matches

    my $longest;
    foreach my $match ( @matches ) {  # Could also avoid temp variable :
                                      # for my $match ( $sequence =~ /((?:$what)+)/g )

        $longest //= $match ;         # Initialize
                                      #  (could also do `$longest = $match
                                      #                    unless defined $match`)

        $longest = $match if length( $longest ) < length( $match );
    }

    return $longest;  # Note this also handles the case of no matches
}

sub-longest\u重复{
我的（$sequence，$what）=@；
my@matches=$sequence=~/（？：$what）+/g；#存储所有匹配项
我的美元最长；
foreach my$match（@matches）{#也可以避免临时变量：
#对于我的$match（$sequence=~/（？：$what）+）/g）
$longest/=$match；#初始化
#（也可以进行“$longest=$match
#除非定义了$match`）
$longest=$match如果长度（$longest）


如果您能够理解这一点，那么以下版本将实现与Schwartzian变换基本相同的功能：
sub longest_repeat {

    my ( $sequence, $what ) = @_;                          # Example:
                                                           # --------------------
    my ( $longest ) = map { $_->[0] }                      # 'ATAT' ...
                        sort { $b->[1] <=> $a->[1] }       # ['ATAT',4], ['AT',2]
                          map { [ $_, length($_) ] }       # ['AT',2], ['ATAT',4]
                            $sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'

    return $longest ;
}

sub-longest\u重复{
我的（$sequence，$what）=@#示例：
# --------------------
我的（$longest）=映射{$\->[0]}#'ATAT'。。。
排序{$b->[1]$a->[1]}['ATAT'，4]，'AT'，2]
地图{[$\u，长度（$\u）]}.['AT'，2]，'ATAT'，4]
$sequence=~/（（？：$what）+）/g；#…'AT'，'ATAT'
返回$longest；
}

有些人可能会争辩说，对进行排序是浪费的，因为它是O（n.log（n））
而不是O（n）
，但是对ya来说有多种多样。
如果语句在子例程中做什么？你不是在试图找到你传递的某个字符串的最大连续外观吗？ATATAT是1还是2？我喜欢你使用Schwartzian变换，但我是一个多毛的Perl黑客；这很可能会让生物信息学家大吃一惊。（我花了一些时间教该领域的人如何使用Perl。这是一个很难解决的问题。）请看我的答案，其中有一个例子更倾向于熟悉Perl和编程，这是你在该领域中经常发现的。@AaronMiller:我也不是计算机科学家/程序员。我的Perl技术大部分来自StackOverflow