Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Database 在Perl中为相似的名称添加数字_Database_Perl_Output_Increment_Blast - Fatal编程技术网

Database 在Perl中为相似的名称添加数字

Database 在Perl中为相似的名称添加数字,database,perl,output,increment,blast,Database,Perl,Output,Increment,Blast,我正在尝试使用Perl将blast文件转换为gff3,因为我是做科学的,所以我对编程非常陌生。 我目前的代码如下: use strict; use warnings; use diagnostics; my $db = "BLAST"; my $prog = "blastn"; my $subid = ""; open(my $inFile, $ARGV[0]) || die "Could not open file '$ARGV[0]' $!"; open(my $outFile, "&g

我正在尝试使用Perl将blast文件转换为gff3,因为我是做科学的,所以我对编程非常陌生。 我目前的代码如下:

use strict;
use warnings;
use diagnostics;

my $db = "BLAST";
my $prog = "blastn";
my $subid = "";

open(my $inFile, $ARGV[0]) || die "Could not open file '$ARGV[0]' $!";
open(my $outFile, ">$ARGV[1]") || die "Could not find file '>$ARGV[1]' $!";

print $outFile "##gff-version 3\n#\n#\n";

while(<$inFile>){

    my ($qseqid, $sseqid, $pident, $length, $mismatch, $gaps, $qstart, $qend, $sstart, $send, $evalue, $bitscore) = split(/\t/);

    if($qstart < $qend){
        $sign = "+";
    } elsif($qstart > $qend){
        $sign = "-";
    } else {
        die "Unexpected qstart and end";
    }

    $bitscore =~ s/^\s*(.*?)\s*$/$1/;

    print $outFile "$sseqid\t$db\t$prog\t$qstart\t$qend\t$bitscore\t$sign\t.\t$subid\n";
}
我想把它改成这个输出

scf_62525_290.contig_1  BLAST   blastn  1       3954    7302    +   .   scf_62525_290.contig_1.t1.d1
scf_62525_290.contig_1  BLAST   blastn  4178    6577    4433    +   .   scf_62525_290.contig_1.t1.d2
scf_62525_290.contig_1  BLAST   blastn  3953    4114    300     +   .   scf_62525_290.contig_1.t1.d3
scf_62525_290.contig_1  BLAST   blastn  4115    4178    119     +   .   scf_62525_290.contig_1.t1.d4
scf_62525_1067.contig_1 BLAST   blastn  1       1665    3075    +   .   scf_62525_1067.contig_1.t1.d1
scf_62525_163.contig_1  BLAST   blastn  7       357     612     +   .   scf_62525_163.contig_1.t1.d1
scf_62525_4028.contig_1 BLAST   blastn  1       1321    2436    +   .   scf_62525_4028.contig_1.t1.d1
scf_62525_4028.contig_1 BLAST   blastn  1319    2231    1687    +   .   scf_62525_4028.contig_1.t1.d2
scf_62525_4028.contig_1 BLAST   blastn  1275    1321    87.9    +   .   scf_62525_4028.contig_1.t1.d3
有没有一种简单的方法? 谢谢


以下是一些示例输入:

Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  3954    0   0   1   3954    23690   27643   0.0 7302
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  2400    0   0   4178    6577    28076   30475   0.0 4433
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  162 0   0   3953    4114    27722   27883   1e-79   300
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  64  0   0   4115    4178    27957   28020   4e-25   119
Ppluv_s011067g00001.1   scf_62525_1067.contig_1 100.00  1665    0   0   1   1665    4944    6608    0.0 3075
Ppluv_s010163g00001.1   scf_62525_163.contig_1  97.77   359 0   8   7   357 797 439 8e-175  612
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  1321    0   0   1   1321    2322    1002    0.0 2436
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  913 0   0   1319    2231    924 12  0.0 1687
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  47  0   0   1275    1321    992 946 4e-16   87.9
Ppluv_s014028g00001.1   scf_62525_3545.contig_1 79.23   1343    241 38  1   1321    1   1327    0.0 902
Ppluv_s014028g00001.1   scf_62525_1712.contig_1 74.27   1951    403 99  340 2227    3076    4990    0.0 732
Ppluv_s014028g00001.1   scf_62525_817.contig_1  82.74   730 87  39  1378    2105    23175   22483   2e-174  614
Ppluv_s014028g00001.1   scf_62525_177.contig_1  76.37   804 178 12  1320    2117    29453   28656   1e-116  422
Ppluv_s014028g00001.1   scf_62525_177.contig_1  75.28   615 134 18  1326    1937    36037   35438   2e-73   278

我可能会用计数器来做。从我在这段代码中看到的情况来看,$subid是递增编号的,如果它们是相同的,那么扩展看起来是相同的。也许你可以这样做

use strict;
use warnings;
use diagnostics;

my $db = "BLAST";
my $prog = "blastn";
my $subid = "";

open(my $inFile, $ARGV[0]) || die "Could not open file '$ARGV[0]' $!";
open(my $outFile, ">$ARGV[1]") || die "Could not find file '>$ARGV[1]' $!";

print $outFile "##gff-version 3\n#\n#\n";

my $cnt = 0;
my ($tmp,$fn);

while(<$inFile>){

    my ($qseqid, $sseqid, $pident, $length, $mismatch, $gaps, $qstart, $qend, $sstart, $send, $evalue, $bitscore) = split(/\t/);
    my $suffix = ".t1.d${cnt}";
    if (!$tmp){
        $tmp = $subid;
    }

    if ($tmp eq $subid){
        $cnt++;
        $fn = "${subid}${suffix}";
        $tmp = $subid;
    }
    else {
        $cnt = 1;
        $fn = "${subid}${suffix}";
        $tmp = $subid;
    }

    if($qstart < $qend){
        $sign = "+";
    } elsif($qstart > $qend){
        $sign = "-";
    } else {
        die "Unexpected qstart and end";
    }

    $bitscore =~ s/^\s*(.*?)\s*$/$1/;

    print $outFile "$sseqid\t$db\t$prog\t$qstart\t$qend\t$bitscore\t$sign\t.\t$fn\n";
}
使用严格;
使用警告;
使用诊断;
我的$db=“BLAST”;
my$prog=“blastn”;
我的$subid=“”;
打开(my$infle,$ARGV[0])| | die“无法打开文件“$ARGV[0]”$;
打开(我的$outFile,“>$ARGV[1]”| | die“找不到文件“>$ARGV[1]”$!”;
打印$outFile“##gff版本3\n#\n#\n”;
我的$cnt=0;
我的($tmp,$fn);
while(){
my($qseqid、$sseqid、$pident、$length、$mismatch、$gaps、$qstart、$qend、$sstart、$send、$evalue、$bitscore)=拆分(/\t/);
my$suffix=“.t1.d${cnt}”;
如果(!$tmp){
$tmp=$subid;
}
if($tmp eq$subid){
$cnt++;
$fn=“${subid}${suffix}”;
$tmp=$subid;
}
否则{
$cnt=1;
$fn=“${subid}${suffix}”;
$tmp=$subid;
}
如果($qstart<$qend){
$sign=“+”;
}elsif($qstart>$qend){
$sign=“-”;
}否则{
死亡“意外的开始和结束”;
}
$bitscore=~s/^\s*(.*?\s*$/$1/;
打印$outFile“$sseqid\t$db\t$prog\t$qstart\t$qend\t$bitscore\t$sign\t.\t$fn\n”;
}

这是未经测试的,但你明白了,我可能会使用计数器。从我在这段代码中看到的情况来看,$subid是递增编号的,如果它们是相同的,那么扩展看起来是相同的。也许你可以这样做

use strict;
use warnings;
use diagnostics;

my $db = "BLAST";
my $prog = "blastn";
my $subid = "";

open(my $inFile, $ARGV[0]) || die "Could not open file '$ARGV[0]' $!";
open(my $outFile, ">$ARGV[1]") || die "Could not find file '>$ARGV[1]' $!";

print $outFile "##gff-version 3\n#\n#\n";

my $cnt = 0;
my ($tmp,$fn);

while(<$inFile>){

    my ($qseqid, $sseqid, $pident, $length, $mismatch, $gaps, $qstart, $qend, $sstart, $send, $evalue, $bitscore) = split(/\t/);
    my $suffix = ".t1.d${cnt}";
    if (!$tmp){
        $tmp = $subid;
    }

    if ($tmp eq $subid){
        $cnt++;
        $fn = "${subid}${suffix}";
        $tmp = $subid;
    }
    else {
        $cnt = 1;
        $fn = "${subid}${suffix}";
        $tmp = $subid;
    }

    if($qstart < $qend){
        $sign = "+";
    } elsif($qstart > $qend){
        $sign = "-";
    } else {
        die "Unexpected qstart and end";
    }

    $bitscore =~ s/^\s*(.*?)\s*$/$1/;

    print $outFile "$sseqid\t$db\t$prog\t$qstart\t$qend\t$bitscore\t$sign\t.\t$fn\n";
}
使用严格;
使用警告;
使用诊断;
我的$db=“BLAST”;
my$prog=“blastn”;
我的$subid=“”;
打开(my$infle,$ARGV[0])| | die“无法打开文件“$ARGV[0]”$;
打开(我的$outFile,“>$ARGV[1]”| | die“找不到文件“>$ARGV[1]”$!”;
打印$outFile“##gff版本3\n#\n#\n”;
我的$cnt=0;
我的($tmp,$fn);
while(){
my($qseqid、$sseqid、$pident、$length、$mismatch、$gaps、$qstart、$qend、$sstart、$send、$evalue、$bitscore)=拆分(/\t/);
my$suffix=“.t1.d${cnt}”;
如果(!$tmp){
$tmp=$subid;
}
if($tmp eq$subid){
$cnt++;
$fn=“${subid}${suffix}”;
$tmp=$subid;
}
否则{
$cnt=1;
$fn=“${subid}${suffix}”;
$tmp=$subid;
}
如果($qstart<$qend){
$sign=“+”;
}elsif($qstart>$qend){
$sign=“-”;
}否则{
死亡“意外的开始和结束”;
}
$bitscore=~s/^\s*(.*?\s*$/$1/;
打印$outFile“$sseqid\t$db\t$prog\t$qstart\t$qend\t$bitscore\t$sign\t.\t$fn\n”;
}
这是未经测试的,但您得到了一个想法

使用a来跟踪您看到每个ID对的次数:

use strict;
use warnings;

my $db = "BLAST";
my $prog = "blastn";
my %unique;

print "##gff-version 3\n#\n#\n";

while (<DATA>) {
    my @fields = split;

    my $sseqid   = $fields[1];
    my $qstart   = $fields[6];
    my $qend     = $fields[7];
    my $bitscore = $fields[11];
    my $sign;

    if ($qstart < $qend) {
        $sign = "+";
    } elsif ($qstart > $qend) {
        $sign = "-";
    } else {
        die "Unexpected qstart and end";
    }

    my @id_parts = split(/[_.]/, $sseqid);
    my $sub_id = ++$unique{$id_parts[1]}{$id_parts[2]};

    print join("\t", $sseqid, $db, $prog, $qstart, $qend, $bitscore, $sign, '.', "$sseqid.$sub_id"), "\n";
}

__DATA__
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  3954    0   0   1   3954    23690   27643   0.0 7302
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  2400    0   0   4178    6577    28076   30475   0.0 4433
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  162 0   0   3953    4114    27722   27883   1e-79   300
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  64  0   0   4115    4178    27957   28020   4e-25   119
Ppluv_s011067g00001.1   scf_62525_1067.contig_1 100.00  1665    0   0   1   1665    4944    6608    0.0 3075
Ppluv_s010163g00001.1   scf_62525_163.contig_1  97.77   359 0   8   7   357 797 439 8e-175  612
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  1321    0   0   1   1321    2322    1002    0.0 2436
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  913 0   0   1319    2231    924 12  0.0 1687
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  47  0   0   1275    1321    992 946 4e-16   87.9
Ppluv_s014028g00001.1   scf_62525_3545.contig_1 79.23   1343    241 38  1   1321    1   1327    0.0 902
Ppluv_s014028g00001.1   scf_62525_1712.contig_1 74.27   1951    403 99  340 2227    3076    4990    0.0 732
Ppluv_s014028g00001.1   scf_62525_817.contig_1  82.74   730 87  39  1378    2105    23175   22483   2e-174  614
Ppluv_s014028g00001.1   scf_62525_177.contig_1  76.37   804 178 12  1320    2117    29453   28656   1e-116  422
Ppluv_s014028g00001.1   scf_62525_177.contig_1  75.28   615 134 18  1326    1937    36037   35438   2e-73   278
使用严格;
使用警告;
我的$db=“BLAST”;
my$prog=“blastn”;
我的%独一无二;
打印“##gff第3版\n#\n#\n”;
而(){
我的@fields=split;
my$sseqid=$fields[1];
my$qstart=$fields[6];
my$qend=$fields[7];
my$bitscore=$fields[11];
我的$符号;
如果($qstart<$qend){
$sign=“+”;
}elsif($qstart>$qend){
$sign=“-”;
}否则{
死亡“意外的开始和结束”;
}
my@id_parts=split(/[[u.]/,$sseqid);
my$sub_id=+$unique{$id_parts[1]}{$id_parts[2]};
打印联接(“\t”、$sseqid、$db、$prog、$qstart、$qend、$bitscore、$sign、”、“$sseqid.$sub_id”)、“\n”;
}
__资料__
Ppluv_s010290g00001.1 scf_62525_290.contig_1 100.00 3954 0 1 3954 23690 27643 0.0 7302
Ppluv_s010290g00001.1 scf_62525_290.contig_1 100.00 24000 0 4178 6577 28076 30475 0.0 4433
Ppluv_s010290g00001.1 scf_62525_290.contig_1 100.00 1620 0 3953 4114 27722 27883 1e-79 300
Ppluv_s010290g00001.1 scf_62525_290.contig_1100.00 64 0 0 4115 4178 27957 28020 4e-25 119
Ppluv_s011067g00001.1 scf_62525_1067.contig_1 100.00 1665 0 1 1665 4944 6608 0.0 3075
Ppluv_s010163g00001.1 scf_62525_163.contig_1 97.77 359 0 8 7 357 797 439 8e-175 612
Ppluv_s014028g00001.1 scf_62525_4028.contig_1 100.00 1321 0 1 1321 2322 1002 0.0 2436
Ppluv_s014028g00001.1 scf_62525_4028.contig_1 100.00 913 0 0 1319 2231 924 12 0.0 1687
Ppluv_s014028g00001.1 scf_62525_4028.contig_1 100.00 47 0 1275 1321 992 946 4e-16 87.9
Ppluv_s014028g00001.1 scf_62525_3545.contig_1 79.23 1343 241 38 1 1321 1 1327 0.0 902
Ppluv_s014028g00001.1 scf_62525_1712.contig_1 74.27 1951 403 99 340 2227 3076 4990 0.0 732
Ppluv_s014028g00001.1 scf_62525_817.contig_1 82.74 730 87 39 1378 2105 23175 22483 2e-174 614
Ppluv_s014028g00001.1 scf_62525_177.contig_1 76.37 804 178 12 1320 2117 29453 28656 1e-116 422
Ppluv_s014028g00001.1 scf_62525_177.contig_1 75.28 615 134 18 1326 1937 36037 35438 2e-73 278
使用a记录您看到每个ID对的次数:

use strict;
use warnings;

my $db = "BLAST";
my $prog = "blastn";
my %unique;

print "##gff-version 3\n#\n#\n";

while (<DATA>) {
    my @fields = split;

    my $sseqid   = $fields[1];
    my $qstart   = $fields[6];
    my $qend     = $fields[7];
    my $bitscore = $fields[11];
    my $sign;

    if ($qstart < $qend) {
        $sign = "+";
    } elsif ($qstart > $qend) {
        $sign = "-";
    } else {
        die "Unexpected qstart and end";
    }

    my @id_parts = split(/[_.]/, $sseqid);
    my $sub_id = ++$unique{$id_parts[1]}{$id_parts[2]};

    print join("\t", $sseqid, $db, $prog, $qstart, $qend, $bitscore, $sign, '.', "$sseqid.$sub_id"), "\n";
}

__DATA__
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  3954    0   0   1   3954    23690   27643   0.0 7302
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  2400    0   0   4178    6577    28076   30475   0.0 4433
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  162 0   0   3953    4114    27722   27883   1e-79   300
Ppluv_s010290g00001.1   scf_62525_290.contig_1  100.00  64  0   0   4115    4178    27957   28020   4e-25   119
Ppluv_s011067g00001.1   scf_62525_1067.contig_1 100.00  1665    0   0   1   1665    4944    6608    0.0 3075
Ppluv_s010163g00001.1   scf_62525_163.contig_1  97.77   359 0   8   7   357 797 439 8e-175  612
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  1321    0   0   1   1321    2322    1002    0.0 2436
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  913 0   0   1319    2231    924 12  0.0 1687
Ppluv_s014028g00001.1   scf_62525_4028.contig_1 100.00  47  0   0   1275    1321    992 946 4e-16   87.9
Ppluv_s014028g00001.1   scf_62525_3545.contig_1 79.23   1343    241 38  1   1321    1   1327    0.0 902
Ppluv_s014028g00001.1   scf_62525_1712.contig_1 74.27   1951    403 99  340 2227    3076    4990    0.0 732
Ppluv_s014028g00001.1   scf_62525_817.contig_1  82.74   730 87  39  1378    2105    23175   22483   2e-174  614
Ppluv_s014028g00001.1   scf_62525_177.contig_1  76.37   804 178 12  1320    2117    29453   28656   1e-116  422
Ppluv_s014028g00001.1   scf_62525_177.contig_1  75.28   615 134 18  1326    1937    36037   35438   2e-73   278
使用严格;
使用警告;
我的$db=“BLAST”;
my$prog=“blastn”;
我的%独一无二;
打印“##gff第3版\n#\n#\n”;
而(){
我的@fields=split;
my$sseqid=$fields[1];
my$qstart=$fields[6];
my$qend=$fields[7];
my$bitscore=$fields[11];
我的$符号;
如果($qstart<$qend){
$sign=“+”;
}elsif($qstart>$qend){
$sign=“-”;
}否则{
死亡“意外的开始和结束”;
}
my@id_parts=split(/[[u.]/,$sseqid);
my$sub_id=+$unique{$id_parts[1]}{$id_parts[2]};
打印联接(“\t”、$sseqid、$db、$prog、$qstart、$qend、$bitscore、$sign、”、“$sseqid.$sub_id”)、“\n”;
}
__资料__
Ppluv_s010290g00001.1 scf_62525_290.contig_1 100.00 3954 0 1 3954 2369