Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
具有重复密码子的Perl DNA需要一个脚本来计数和增值_Perl_Hash_Bioinformatics_Counting - Fatal编程技术网

具有重复密码子的Perl DNA需要一个脚本来计数和增值

具有重复密码子的Perl DNA需要一个脚本来计数和增值,perl,hash,bioinformatics,counting,Perl,Hash,Bioinformatics,Counting,我现在刚刚开始使用perl,需要一些帮助。所以我的问题是我有一个dna分子,我需要在其中找到重复的密码子并打印出来。让我告诉你我到现在为止做了什么: $dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata"; for ($i = 0; $i<length ($dna); $i = $i+3) { $triplet = substr ($dna,$i,3); @triplet = (

我现在刚刚开始使用perl,需要一些帮助。所以我的问题是我有一个dna分子,我需要在其中找到重复的密码子并打印出来。让我告诉你我到现在为止做了什么:

$dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata";

for ($i = 0; $i<length ($dna); $i = $i+3) {
    $triplet = substr ($dna,$i,3);
    @triplet = ("$triplet");
    print "@triplet\n";
}
$dna=“atattaacagattaagagagagagagttccccccagagatatgaggtata”;

对于($i=0;$i来说,这是一个有点深奥的函数,但我认为将DNA字符串拆分为三元组要简单得多

您还应该在每个Perl程序开始时
使用strict
使用warnings
,并使用
my
尽可能接近其第一个使用点来声明每个变量

计算三元组只需声明一个散列
%count
,并使用所有三元组作为键来增加相应元素的计数

请注意,Perl哈希本身是无序的,因此输出是伪随机顺序。如果希望它们按计数、字母顺序或在DNA字符串中出现的顺序排列,则需要在哈希键上添加一个额外的
排序

use strict;
use warnings;

my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my @triplets = unpack '(a3)*', $dna;

my %count;
++$count{$_} for @triplets;
printf "%s - %d\n", $_, $count{$_} for keys %count;
输出

ttc - 1
cca - 1
aga - 3
gat - 1
ggt - 1
atg - 1
gag - 3
ata - 3
taa - 1
gtt - 1
tta - 1
ccc - 2
aca - 1
tat - 2
ttc => 1
cca => 1
aga => 3
gat => 1
ggt => 1
atg => 1
gag => 3
ata => 3
taa => 1
gtt => 1
tta => 1
ccc => 2
aca => 1
tat => 2

这是一个有点深奥的函数,但我认为这使得将DNA字符串拆分为三元组更加简单

您还应该在每个Perl程序开始时
使用strict
使用warnings
,并使用
my
尽可能接近其第一个使用点来声明每个变量

计算三元组只需声明一个散列
%count
,并使用所有三元组作为键来增加相应元素的计数

请注意,Perl哈希本身是无序的,因此输出是伪随机顺序。如果希望它们按计数、字母顺序或在DNA字符串中出现的顺序排列,则需要在哈希键上添加一个额外的
排序

use strict;
use warnings;

my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my @triplets = unpack '(a3)*', $dna;

my %count;
++$count{$_} for @triplets;
printf "%s - %d\n", $_, $count{$_} for keys %count;
输出

ttc - 1
cca - 1
aga - 3
gat - 1
ggt - 1
atg - 1
gag - 3
ata - 3
taa - 1
gtt - 1
tta - 1
ccc - 2
aca - 1
tat - 2
ttc => 1
cca => 1
aga => 3
gat => 1
ggt => 1
atg => 1
gag => 3
ata => 3
taa => 1
gtt => 1
tta => 1
ccc => 2
aca => 1
tat => 2
注意使用的regex
/。{3}/g
是“通用的”,因为
匹配任何字符。 如果您知道您的dna字符串仅由
a、t、c
g
字符组成,则可以使用此字符串:
/[atcg]{3}/g
获得相同的结果

这已用于输出:

for my $key (keys %hash) {
  print $key . " =>  " .$hash{$key} ."\n";
}
这就是结果:

ttc =>  1
cca =>  1
aga =>  3
gat =>  1
ggt =>  1
atg =>  1
gag =>  3
ata =>  3
taa =>  1
gtt =>  1
tta =>  1
ccc =>  2
aca =>  1
tat =>  2
注意使用的regex
/。{3}/g
是“通用的”,因为
匹配任何字符。 如果您知道您的dna字符串仅由
a、t、c
g
字符组成,则可以使用此字符串:
/[atcg]{3}/g
获得相同的结果

这已用于输出:

for my $key (keys %hash) {
  print $key . " =>  " .$hash{$key} ."\n";
}
这就是结果:

ttc =>  1
cca =>  1
aga =>  3
gat =>  1
ggt =>  1
atg =>  1
gag =>  3
ata =>  3
taa =>  1
gtt =>  1
tta =>  1
ccc =>  2
aca =>  1
tat =>  2

你可以写一个循环,不仅可以计算序列上的密码子,还可以计算任何大小为k的DNA单词–长度为k的k-mer。我知道你只想计算密码子,但你永远不知道什么时候需要再次对序列进行这种计算。k-mer计数是序列分析中非常常见的事情。这是编写能够解决您的问题的代码始终是一个好主意,但也适用于比以前更大的范围——为了代码的可重用性

#!/usr/bin/perl

#ALWAYS use warnings and strict at the start of every script! It is safer, better,
#and can save you a lot of trouble in debugging your code. Also, declare your
#variables with 'my', so you don't end up with crazy/empty variables 
#all over your code

use warnings;
use strict;

my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my $length = length($dna); #we need the length of the DNA sequence for our loop

my %kmers; #hash with the counts for the codons (or k-mers, your choice)
my $k = 3; #k is the size of the DNA words you want to count. In your case, it is 3.

for(my $i = 0; $i <= $length - $k; $i = $i + 3) {
    my $kmer = substr($dna, $i, $k); #walks over the sequence getting the codons

    #building the hash
    $kmers{$kmer}++; #compact way of saying: if word is new, count =1; 
                                            #if word was already seen, count += 1;

}

#Printing the hash:
while(my ($kmer, $count) = each %kmers) {
    print "$kmer => $count\n";
}
要计算序列中长度为k的所有可能单词,循环的
将略有不同:

for(my $i = 0; $i <= $length - $k; $i++) {
    my $kmer = substr($dna, $i, $k); #walks over the sequence getting the k-mers

    #building the hash
    $kmers{$kmer}++; #compact way of saying: if word is new, count =1; 
                                            #if word was already seen, count += 1;                  
}

你可以写一个循环,不仅可以计算序列上的密码子,还可以计算任何大小为k的DNA单词–长度为k的k-mer。我知道你只想计算密码子,但你永远不知道什么时候需要再次对序列进行这种计算。k-mer计数是序列分析中非常常见的事情。这是编写能够解决您的问题的代码始终是一个好主意,但也适用于比以前更大的范围——为了代码的可重用性

#!/usr/bin/perl

#ALWAYS use warnings and strict at the start of every script! It is safer, better,
#and can save you a lot of trouble in debugging your code. Also, declare your
#variables with 'my', so you don't end up with crazy/empty variables 
#all over your code

use warnings;
use strict;

my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my $length = length($dna); #we need the length of the DNA sequence for our loop

my %kmers; #hash with the counts for the codons (or k-mers, your choice)
my $k = 3; #k is the size of the DNA words you want to count. In your case, it is 3.

for(my $i = 0; $i <= $length - $k; $i = $i + 3) {
    my $kmer = substr($dna, $i, $k); #walks over the sequence getting the codons

    #building the hash
    $kmers{$kmer}++; #compact way of saying: if word is new, count =1; 
                                            #if word was already seen, count += 1;

}

#Printing the hash:
while(my ($kmer, $count) = each %kmers) {
    print "$kmer => $count\n";
}
要计算序列中长度为k的所有可能单词,循环的
将略有不同:

for(my $i = 0; $i <= $length - $k; $i++) {
    my $kmer = substr($dna, $i, $k); #walks over the sequence getting the k-mers

    #building the hash
    $kmers{$kmer}++; #compact way of saying: if word is new, count =1; 
                                            #if word was already seen, count += 1;                  
}
“映射”功能使您可以更简洁地编写:

#!/usr/bin/perl

use strict;
use warnings;

my $dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata";
my %hash = ();
map { $hash{$_}++ } unpack('(a3)*',$dna);

print map { ( $_, "\t", $hash{$_}, "\n" ) } sort keys %hash;
“映射”功能使您可以更简洁地编写:

#!/usr/bin/perl

use strict;
use warnings;

my $dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata";
my %hash = ();
map { $hash{$_}++ } unpack('(a3)*',$dna);

print map { ( $_, "\t", $hash{$_}, "\n" ) } sort keys %hash;