Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于perl中的输入查找最长的重复字符串(使用子例程)_Perl_While Loop_Repeat_Substr_Subroutine - Fatal编程技术网

基于perl中的输入查找最长的重复字符串(使用子例程)

基于perl中的输入查找最长的重复字符串(使用子例程),perl,while-loop,repeat,substr,subroutine,Perl,While Loop,Repeat,Substr,Subroutine,所以我试图找到给定特定模式的最长重复。到目前为止,我的代码看起来是这样的,并且非常接近,但是它没有完全给出想要的结果: use warnings; use strict; my $DNA; $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT" ; print "$DNA\n" ; print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n" ; p

所以我试图找到给定特定模式的最长重复。到目前为止,我的代码看起来是这样的,并且非常接近,但是它没有完全给出想要的结果:

use warnings;
use strict;    

my $DNA;       
$DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT" ;
print "$DNA\n" ;
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n" ;
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n" ;
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n" ;

sub longestRepeat{

  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;



        if ($whatBP eq "AT"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;

        }
        if ($whatBP eq "TAGA"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;
        }

        if ($whatBP eq "C"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;
        }
}   
它现在所做的只是在序列中找到AT,TAGA,C的总量。它不是只给我最长的那个的长度,而是把它们加起来,给我总数。我认为while循环有问题,但是我不确定。任何帮助都将不胜感激


p、 它还应该以字符串形式显示最长的重复,而不是数字形式(这里可能使用substr)

您的
longestRepeat
函数不需要检查它正在处理三种情况中的哪一种——一般来说,当您发现自己多次编写了完全相同的指令时,这是一个提示,提示您可以排除重复,从而简化程序。考虑下面的内容,我已经清理了它们的功能,并为说明目的进行了注释:

#!/usr/bin/env perl
use warnings;
use strict;    

# no need to declare and define separately; this works fine
# also no need for space before semicolon
my $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT";
print "$DNA\n";
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n";
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n";
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n";

sub longestRepeat {

  # note that, within a function, @_ is the default argument to shift();
  # hence its absence in the next two lines. (in practice, you're more 
  # likely to see 'shift' in this context without even parentheses, much
  # less the full 'shift(@_)'; be prepared to run into it.)
  my $sequence = shift(); # take the first argument
  my $kmer = shift(); # take the second argument

  # these state variables we'll use to keep track of what we're doing here;
  # $longest_match, a string, will eventually be returned.
  my $longest_matchlen = 0;
  my $longest_match = '';

  # for each match in $sequence of one or more $kmer repeats...
  while ($sequence =~ m@($kmer)+@g) {

    # ...get the length of the match, stored in $1 by the parenthesized
    # capture group, with the '+' quantifier grabbing the longest match 
    # available from each starting point (see `man perlre' for more)...
    my $this_matchlen = length($1);

    # ...and if this match is longer than the longest yet found...
    if ($this_matchlen > $longest_matchlen) {

      # ...store this match's length in $longest_matchlen...
      $longest_matchlen = $this_matchlen;

      # ...and store the match itself in $longest_match.
      $longest_match = $1;

    }; # end of the 'if' statement

  }; # end of the 'while' loop

  # at this point, the longest match we found is in $longest_match; if
  # we found no matches, then $longest_match still contains the empty
  # string we assigned up there before the while loop started, which is
  # the correct result in a case where $kmer never appears in $sequence.
  return $longest_match;
};
你在学习生物信息学,是吗?我有一些向生物信息学家教授Perl的经验,我认为该领域的编程技能和人才分布极为广泛,图的左侧有一个非常不幸的凸起——这是一种礼貌的说法,作为一名专业程序员,我看到的大多数生物信息学Perl代码从不太好到非常差

我提到这一点并非有意侮辱,只是为了证实我的强烈建议,即在你目前正在学习的课程中加入一些计算机科学课程;你越能接触到算法精确公式中涉及的一般概念和思维习惯,你就越能应对你所在领域的要求——事实上,根据我的经验,你比大多数人准备得更充分;虽然我自己不是生物信息学家,但在与生物信息学家合作的过程中,在我看来,对生物信息学家来说,强大的编程背景可能比强大的生物学背景更有用。

(这是从这个问题的a部分粘贴的)

根据子例程的名称,我假设您希望查找序列中最长的重复序列

如果是的话,那么以下内容如何:

sub longest_repeat {

    my ( $sequence, $what ) = @_;

    my @matches = $sequence =~ /((?:$what)+)/g ;  # Store all matches

    my $longest;
    foreach my $match ( @matches ) {  # Could also avoid temp variable :
                                      # for my $match ( $sequence =~ /((?:$what)+)/g )

        $longest //= $match ;         # Initialize
                                      #  (could also do `$longest = $match
                                      #                    unless defined $match`)

        $longest = $match if length( $longest ) < length( $match );
    }

    return $longest;  # Note this also handles the case of no matches
}
sub-longest\u重复{
我的($sequence,$what)=@;
my@matches=$sequence=~/(?:$what)+/g;#存储所有匹配项
我的美元最长;
foreach my$match(@matches){#也可以避免临时变量:
#对于我的$match($sequence=~/(?:$what)+)/g)
$longest/=$match;#初始化
#(也可以进行“$longest=$match
#除非定义了$match`)
$longest=$match如果长度($longest)
如果您能够理解这一点,那么以下版本将实现与Schwartzian变换基本相同的功能:

sub longest_repeat {

    my ( $sequence, $what ) = @_;                          # Example:
                                                           # --------------------
    my ( $longest ) = map { $_->[0] }                      # 'ATAT' ...
                        sort { $b->[1] <=> $a->[1] }       # ['ATAT',4], ['AT',2]
                          map { [ $_, length($_) ] }       # ['AT',2], ['ATAT',4]
                            $sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'

    return $longest ;
}
sub-longest\u重复{
我的($sequence,$what)=@#示例:
# --------------------
我的($longest)=映射{$\->[0]}#'ATAT'。。。
排序{$b->[1]$a->[1]}['ATAT',4],'AT',2]
地图{[$\u,长度($\u)]}.['AT',2],'ATAT',4]
$sequence=~/((?:$what)+)/g;#…'AT','ATAT'
返回$longest;
}

有些人可能会争辩说,对
进行排序是浪费的,因为它是
O(n.log(n))
而不是
O(n)
,但是对ya来说有多种多样。

如果
语句在子例程中做什么?你不是在试图找到你传递的某个字符串的最大连续外观吗?ATATAT是1还是2?我喜欢你使用Schwartzian变换,但我是一个多毛的Perl黑客;这很可能会让生物信息学家大吃一惊。(我花了一些时间教该领域的人如何使用Perl。这是一个很难解决的问题。)请看我的答案,其中有一个例子更倾向于熟悉Perl和编程,这是你在该领域中经常发现的。@AaronMiller:我也不是计算机科学家/程序员。我的Perl技术大部分来自StackOverflow