具有重复密码子的Perl DNA需要一个脚本来计数和增值
我现在刚刚开始使用perl,需要一些帮助。所以我的问题是我有一个dna分子,我需要在其中找到重复的密码子并打印出来。让我告诉你我到现在为止做了什么:具有重复密码子的Perl DNA需要一个脚本来计数和增值,perl,hash,bioinformatics,counting,Perl,Hash,Bioinformatics,Counting,我现在刚刚开始使用perl,需要一些帮助。所以我的问题是我有一个dna分子,我需要在其中找到重复的密码子并打印出来。让我告诉你我到现在为止做了什么: $dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata"; for ($i = 0; $i<length ($dna); $i = $i+3) { $triplet = substr ($dna,$i,3); @triplet = (
$dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata";
for ($i = 0; $i<length ($dna); $i = $i+3) {
$triplet = substr ($dna,$i,3);
@triplet = ("$triplet");
print "@triplet\n";
}
$dna=“atattaacagattaagagagagagagttccccccagagatatgaggtata”;
对于($i=0;$i来说,这是一个有点深奥的函数,但我认为将DNA字符串拆分为三元组要简单得多
您还应该在每个Perl程序开始时使用strict
和使用warnings
,并使用my
尽可能接近其第一个使用点来声明每个变量
计算三元组只需声明一个散列%count
,并使用所有三元组作为键来增加相应元素的计数
请注意,Perl哈希本身是无序的,因此输出是伪随机顺序。如果希望它们按计数、字母顺序或在DNA字符串中出现的顺序排列,则需要在哈希键上添加一个额外的排序
use strict;
use warnings;
my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my @triplets = unpack '(a3)*', $dna;
my %count;
++$count{$_} for @triplets;
printf "%s - %d\n", $_, $count{$_} for keys %count;
输出
ttc - 1
cca - 1
aga - 3
gat - 1
ggt - 1
atg - 1
gag - 3
ata - 3
taa - 1
gtt - 1
tta - 1
ccc - 2
aca - 1
tat - 2
ttc => 1
cca => 1
aga => 3
gat => 1
ggt => 1
atg => 1
gag => 3
ata => 3
taa => 1
gtt => 1
tta => 1
ccc => 2
aca => 1
tat => 2
这是一个有点深奥的函数,但我认为这使得将DNA字符串拆分为三元组更加简单
您还应该在每个Perl程序开始时使用strict
和使用warnings
,并使用my
尽可能接近其第一个使用点来声明每个变量
计算三元组只需声明一个散列%count
,并使用所有三元组作为键来增加相应元素的计数
请注意,Perl哈希本身是无序的,因此输出是伪随机顺序。如果希望它们按计数、字母顺序或在DNA字符串中出现的顺序排列,则需要在哈希键上添加一个额外的排序
use strict;
use warnings;
my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my @triplets = unpack '(a3)*', $dna;
my %count;
++$count{$_} for @triplets;
printf "%s - %d\n", $_, $count{$_} for keys %count;
输出
ttc - 1
cca - 1
aga - 3
gat - 1
ggt - 1
atg - 1
gag - 3
ata - 3
taa - 1
gtt - 1
tta - 1
ccc - 2
aca - 1
tat - 2
ttc => 1
cca => 1
aga => 3
gat => 1
ggt => 1
atg => 1
gag => 3
ata => 3
taa => 1
gtt => 1
tta => 1
ccc => 2
aca => 1
tat => 2
注意使用的regex/。{3}/g
是“通用的”,因为
匹配任何字符。
如果您知道您的dna字符串仅由a、t、c
和g
字符组成,则可以使用此字符串:/[atcg]{3}/g
获得相同的结果
这已用于输出:
for my $key (keys %hash) {
print $key . " => " .$hash{$key} ."\n";
}
这就是结果:
ttc => 1
cca => 1
aga => 3
gat => 1
ggt => 1
atg => 1
gag => 3
ata => 3
taa => 1
gtt => 1
tta => 1
ccc => 2
aca => 1
tat => 2
注意使用的regex/。{3}/g
是“通用的”,因为
匹配任何字符。
如果您知道您的dna字符串仅由a、t、c
和g
字符组成,则可以使用此字符串:/[atcg]{3}/g
获得相同的结果
这已用于输出:
for my $key (keys %hash) {
print $key . " => " .$hash{$key} ."\n";
}
这就是结果:
ttc => 1
cca => 1
aga => 3
gat => 1
ggt => 1
atg => 1
gag => 3
ata => 3
taa => 1
gtt => 1
tta => 1
ccc => 2
aca => 1
tat => 2
你可以写一个循环,不仅可以计算序列上的密码子,还可以计算任何大小为k的DNA单词–长度为k的k-mer。我知道你只想计算密码子,但你永远不知道什么时候需要再次对序列进行这种计算。k-mer计数是序列分析中非常常见的事情。这是编写能够解决您的问题的代码始终是一个好主意,但也适用于比以前更大的范围——为了代码的可重用性
#!/usr/bin/perl
#ALWAYS use warnings and strict at the start of every script! It is safer, better,
#and can save you a lot of trouble in debugging your code. Also, declare your
#variables with 'my', so you don't end up with crazy/empty variables
#all over your code
use warnings;
use strict;
my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my $length = length($dna); #we need the length of the DNA sequence for our loop
my %kmers; #hash with the counts for the codons (or k-mers, your choice)
my $k = 3; #k is the size of the DNA words you want to count. In your case, it is 3.
for(my $i = 0; $i <= $length - $k; $i = $i + 3) {
my $kmer = substr($dna, $i, $k); #walks over the sequence getting the codons
#building the hash
$kmers{$kmer}++; #compact way of saying: if word is new, count =1;
#if word was already seen, count += 1;
}
#Printing the hash:
while(my ($kmer, $count) = each %kmers) {
print "$kmer => $count\n";
}
要计算序列中长度为k的所有可能单词,循环的将略有不同:
for(my $i = 0; $i <= $length - $k; $i++) {
my $kmer = substr($dna, $i, $k); #walks over the sequence getting the k-mers
#building the hash
$kmers{$kmer}++; #compact way of saying: if word is new, count =1;
#if word was already seen, count += 1;
}
你可以写一个循环,不仅可以计算序列上的密码子,还可以计算任何大小为k的DNA单词–长度为k的k-mer。我知道你只想计算密码子,但你永远不知道什么时候需要再次对序列进行这种计算。k-mer计数是序列分析中非常常见的事情。这是编写能够解决您的问题的代码始终是一个好主意,但也适用于比以前更大的范围——为了代码的可重用性
#!/usr/bin/perl
#ALWAYS use warnings and strict at the start of every script! It is safer, better,
#and can save you a lot of trouble in debugging your code. Also, declare your
#variables with 'my', so you don't end up with crazy/empty variables
#all over your code
use warnings;
use strict;
my $dna = 'atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata';
my $length = length($dna); #we need the length of the DNA sequence for our loop
my %kmers; #hash with the counts for the codons (or k-mers, your choice)
my $k = 3; #k is the size of the DNA words you want to count. In your case, it is 3.
for(my $i = 0; $i <= $length - $k; $i = $i + 3) {
my $kmer = substr($dna, $i, $k); #walks over the sequence getting the codons
#building the hash
$kmers{$kmer}++; #compact way of saying: if word is new, count =1;
#if word was already seen, count += 1;
}
#Printing the hash:
while(my ($kmer, $count) = each %kmers) {
print "$kmer => $count\n";
}
要计算序列中长度为k的所有可能单词,循环的将略有不同:
for(my $i = 0; $i <= $length - $k; $i++) {
my $kmer = substr($dna, $i, $k); #walks over the sequence getting the k-mers
#building the hash
$kmers{$kmer}++; #compact way of saying: if word is new, count =1;
#if word was already seen, count += 1;
}
“映射”功能使您可以更简洁地编写:
#!/usr/bin/perl
use strict;
use warnings;
my $dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata";
my %hash = ();
map { $hash{$_}++ } unpack('(a3)*',$dna);
print map { ( $_, "\t", $hash{$_}, "\n" ) } sort keys %hash;
“映射”功能使您可以更简洁地编写:
#!/usr/bin/perl
use strict;
use warnings;
my $dna ="atatatttaacagattaagagagagagagagttttcccccccccagagatatatatgagaggtata";
my %hash = ();
map { $hash{$_}++ } unpack('(a3)*',$dna);
print map { ( $_, "\t", $hash{$_}, "\n" ) } sort keys %hash;