Regex Perl未初始化值哈希查找基因符号
更新(2): 已将代码更改为丢弃标头中的注释,但仍在哈希键/值分配中使用语法: “$geneSymbolToGo{”附近的./convertDataToGeneSymbol.pl第99行出现语法错误 “}”附近的./convertDataToGeneSymbol.pl第101行出现语法错误 我似乎在代码中找不到任何错误,所以我认为数组无法读取$go的值,也许 以下是输入文件3的标题: !!10-20行注释 UniProtK/t BA0A021WW37/t CG17167/t GO:0016021/t GO\u参考号:0000038Regex Perl未初始化值哈希查找基因符号,regex,perl,hash,initialization,Regex,Perl,Hash,Initialization,更新(2): 已将代码更改为丢弃标头中的注释,但仍在哈希键/值分配中使用语法: “$geneSymbolToGo{”附近的./convertDataToGeneSymbol.pl第99行出现语法错误 “}”附近的./convertDataToGeneSymbol.pl第101行出现语法错误 我似乎在代码中找不到任何错误,所以我认为数组无法读取$go的值,也许 以下是输入文件3的标题: !!10-20行注释 UniProtK/t BA0A021WW37/t CG17167/t GO:0016021
(仍在学习如何在此网站上设置格式;/t表示制表符分隔) 顺便说一句,我对这些评论感到抱歉。我的教授要求对我们的节目进行广泛的评论。Strict给了我一些关于这个程序的问题(主要是因为我缺乏经验),但是当我删除它时,我得到了我想要的结果。谢谢你一直以来的帮助
#!/usr/bin/perl
use warnings;
use diagnostics;
# Title: convertDataToGeneSymbol.pl
# Author: Nicholas Bense
# Date: 11/4/15
# Open a filehandle to read file #1
open(INF1,"<",'/scratch/Drosophila/fb_synonym_fb_2014_05.tsv' ) or die $!;
# Open a filehandle to read file #2
open(INF2,"<",'/scratch/Drosophila/FlyRNAi_data_baseline_vs_EGF.txt') or die $!;
# Open a filehandle to read file #3
open(INF3,"<",'/scratch/Drosophila/gene_association.goa_fly') or die $!;
# Open a filehandle to write new file
open(OUTF1,">",'FlyRNAi_data_baseline_vs_EGFSymbol.txt') or die $!;
# Open a filehandle to write new file
open(OUTF2,">",'FlyRNAi_data_baseline_vs_EGF_GO.txt') or die $!;
# Initialize a hash for the gene symbol conversion
my %geneSymbolConversion;
# Read input file 1 line by line
while (<INF1>){
# Get rid of whitespace
chomp;
# Split the line
my @inf1Array = split("\t", $_);
# Filter entries starting with FBgn
if ($inf1Array[0] =~ /(^FBgn\d+)/){
# Assign column 1 to hash key scalar
my $geneID = $inf1Array[0];
# Assign column 2 to hash value scalar
my $geneSymbol = $inf1Array[1];
# Assign key and value to hash
$geneSymbolConversion{$geneID} = $geneSymbol;
}
}
# Discard first line of input file 2
<INF2>;
# Read input file 2 line by line
while (<INF2>){
# Get rid of whitespace
chomp;
# Split the line on tabs
my ($geneID, $egf_Baseline, $egf_Stimulus) = split("\t", $_);
# Check if the codon is present in the hash
if (defined $geneSymbolConversion{$geneID}){
# Get the value associated with the codon from the hash
$geneSymbol = $geneSymbolConversion{$geneID};
}
# Join data and print to output file
print OUTF1 join( "\t", $geneSymbol, $egf_Baseline, $egf_Stimulus), "\n";
}
# Initialize hash for GO conversion
my %geneSymbolToGo;
<INF3>;
# Read input file 3 line by line
while (<INF3>){
# Get rid of whitespace
chomp;
# Discard comment lines
if ($_ !~ /!/){
# Split the line on tabs
my @inf3Array = split("\t", $_);
# Assign column 3 to hash key scalar
my $geneSymbol = $inf3Array[2];
# Assign column 4 to hash value scalar
my $go = $inf3Array[3];
# Assign key and value to hash
my $geneSymbolToGo{$geneSymbol} = $go;
}
}
# Open a filehandle to read file #3
open(INF4,"<",'FLYRNAi_data_baseline_vs_EGFSymbol.txt') or die $!;
# Read input file 4 line by line
while (<INF4>){
# Remove end of line characters
chomp;
# Split the line on tabs
my ($geneSymbol, $egf_Baseline, $egf_Stimulus), "\n";
# Check if the gene symbol is present in the hash
if (defined $geneSymbolToGo{$geneSymbol}){
# Get the value associated with the codon from the hash
$go = $geneSymbolToGo{$geneSymbol};
}
# Join data and print to output file
print OUTF2 join( "\t", $go, $egf_Baseline, $egf_Stimulus), "\n";
}
#/usr/bin/perl
使用警告;
使用诊断;
#标题:convertDataToGeneSymbol.pl
#作者:尼古拉斯·本斯
#日期:2015年11月4日
#打开文件句柄以读取文件#1
打开(INF1,“
- 总是
在每个Perl程序开始时。
使用诊断功能
不太有用,除非您无法理解这两个程序产生的错误消息
- 如果您有许多磁盘操作要执行,那么
使用autodie
非常有用,可以避免在每次操作后编写合理的代码来捕获任何错误,如或die$!
- 始终使用词法文件句柄
open my $inf1_fh, '<', '/scratch/Drosophila/fb_synonym_fb_2014_05.tsv'
作为
- 您的正则表达式
/(^FBgn\d+)/
捕获匹配的字符串,但从未使用该捕获,因此您应该只编写/^FBgn\d+/
- 我不明白你在
循环时用做什么
while ( $INF1Array[0] =~ /(^FBgn\d+)/ ) { ... }
因为$inf1数组[0]
(应该是$inf1\u数组[0]
)在循环体中从未更改,所以它永远不会终止。我猜while
应该是if
- 使用Perl定义的or运算符
my $geneSymbol = "NA";
if ( defined $geneSymbolConversion{$geneID} ) {
$geneSymbol = $geneSymbolConversion{$geneID};
}
你应该
my $gene_symbol = $conversion{$gene_id} // 'NA'
这是我在写一些更完美、更实用的东西方面的尝试。它远不是一个复杂的程序,所以我认为它根本不需要任何注释。它们占据的垂直空间比它们在解释中所弥补的更为清晰
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
my %conversion;
{
open my $in_fh, '<', '/scratch/Drosophila/fb_synonym_fb_2014_05.tsv';
while ( <$in_fh> ) {
chomp;
my ($gene_id, $gene_symbol) = split /\t/;
$conversion{$gene_id} = $gene_symbol if $gene_id =~ /^FBgn\d+/;
}
}
{
open my $in_fh, '<', '/scratch/Drosophila/FlyRNAi_data_baseline_vs_EGF.txt';
open my $out_fh, '>', 'FLYRNAi_data_baseline_vs_EGFSymbol.txt';
while ( <$in_fh> ) {
chomp;
my ( $gene_id, $egf_baseline, $egf_stimulus ) = split /\t/;
my $gene_symbol = $conversion{$gene_id} // 'NA';
print $out_fh join("\t", $gene_id, $gene_symbol, $egf_baseline, $egf_stimulus), "\n";
}
}
!/usr/bin/perl
严格使用;
使用“全部”警告;
使用自动模具;
我的%转换;
{
打开我的$in_fh,'请还包括输入的样本,以便我们可以复制问题。到PerlMonks。第34行似乎是一个无限循环。为什么要使用诊断
,而不是使用严格的
?诊断会大大降低速度,对您的问题没有帮助。建议:修复缩进,然后使用
my $geneSymbol = "NA";
if ( defined $geneSymbolConversion{$geneID} ) {
$geneSymbol = $geneSymbolConversion{$geneID};
}
my $gene_symbol = $conversion{$gene_id} // 'NA'
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
my %conversion;
{
open my $in_fh, '<', '/scratch/Drosophila/fb_synonym_fb_2014_05.tsv';
while ( <$in_fh> ) {
chomp;
my ($gene_id, $gene_symbol) = split /\t/;
$conversion{$gene_id} = $gene_symbol if $gene_id =~ /^FBgn\d+/;
}
}
{
open my $in_fh, '<', '/scratch/Drosophila/FlyRNAi_data_baseline_vs_EGF.txt';
open my $out_fh, '>', 'FLYRNAi_data_baseline_vs_EGFSymbol.txt';
while ( <$in_fh> ) {
chomp;
my ( $gene_id, $egf_baseline, $egf_stimulus ) = split /\t/;
my $gene_symbol = $conversion{$gene_id} // 'NA';
print $out_fh join("\t", $gene_id, $gene_symbol, $egf_baseline, $egf_stimulus), "\n";
}
}