Perl 从XML文件打印特定SNP等位基因的详细信息
我是新的perl用户,我在一个文件中有以下几行Perl 从XML文件打印特定SNP等位基因的详细信息,perl,Perl,我是新的perl用户,我在一个文件中有以下几行 <Ss ssId="76536062" handle="AFFY" batchId="52074" locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward" strand="bottom" molType="genomic" buildId="130" methodClass="hybridize" validated="by-sub
<Ss ssId="76536062" handle="AFFY" batchId="52074"
locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward"
strand="bottom" molType="genomic" buildId="130" methodClass="hybridize"
validated="by-submitter">
<Sequence>
<Seq5>TCACCTCTGGGACTGA</Seq5>
<Observed>C/T</Observed>
<Seq3>AATTAGGAAGAGCTGG</Seq3>
</Sequence>
</Ss>
<Ss ssId="104807776" handle="KRIBB_YJKIM" batchId="60510"
locSnpId="KHS1200112" subSnpClass="snp" orient="forward" strand="bottom"
molType="genomic" buildId="130" methodClass="hybridize"
validated="by-submitter">
<Sequence>
<Seq5>
TAGGAACAAGGTACATTCGCGGGATAAATGTGGCCAAGTTTTATCTGCTGCCAGGGCTTTCAAATAGGTTGACCTGACAATGGGTCACCTCTGGGACTGA
</Seq5>
<Observed>C/T</Observed>
<Seq3>
AATTAGGAAGAGCTGGTACCTAAAATGAAAGATGCCCTTAAATTTCAGATTCACAATTTT
</Seq3>
</Sequence>
</Ss>
TCACTCTGGGACTGA
电汇
aattagaagagctgg
TAGGAGAGTACATCGCGGGATAATGGCCAAGTTATCTGCTGCCAGGGGCTTTCAATAGGTTGCTGTGTGACAGAGAGTCATGCCAAGTCACTCTCTCTGGGACTGA
电汇
AATTAGGAGCTGTACCTAAAATGAAGAGATGCCTTAAATTTCAGATTCACAATTTT
我想打印两个观察到的内容,即C/T。另外,我想打印seq5和seq3的30个bp。
提前谢谢。任何能帮助我的想法,像这样的想法都可以:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new()->parse( \*DATA );
foreach my $sequence ( $twig->get_xpath('//Sequence') ) {
foreach my $element ( $sequence->children ) {
print $element ->tag, " => ", $element->trimmed_text, "\n";
}
}
__DATA__
<root>
<Ss ssId="76536062" handle="AFFY" batchId="52074"
locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward"
strand="bottom" molType="genomic" buildId="130" methodClass="hybridize"
validated="by-submitter">
<Sequence>
<Seq5>TCACCTCTGGGACTGA</Seq5>
<Observed>C/T</Observed>
<Seq3>AATTAGGAAGAGCTGG</Seq3>
</Sequence>
</Ss>
<Ss ssId="104807776" handle="KRIBB_YJKIM" batchId="60510"
locSnpId="KHS1200112" subSnpClass="snp" orient="forward" strand="bottom"
molType="genomic" buildId="130" methodClass="hybridize"
validated="by-submitter">
<Sequence>
<Seq5>
TAGGAACAAGGTACATTCGCGGGATAAATGTGGCCAAGTTTTATCTGCTGCCAGGGCTTTCAAATAGGTTGACCTGACAATGGGTCACCTCTGGGACTGA
</Seq5>
<Observed>C/T</Observed>
<Seq3>
AATTAGGAAGAGCTGGTACCTAAAATGAAAGATGCCCTTAAATTTCAGATTCACAATTTT
</Seq3>
</Sequence>
</Ss>
</root>
#/usr/bin/env perl
严格使用;
使用警告;
使用XML::Twig;
my$twig=XML::twig->new()->解析(\*数据);
foreach my$sequence($twig->get_xpath('//sequence')){
foreach my$元素($sequence->children){
打印$element->tag,“=>”,$element->trimmed_text,“\n”;
}
}
__资料__
TCACTCTGGGACTGA
电汇
aattagaagagctgg
TAGGAGAGTACATCGCGGGATAATGGCCAAGTTATCTGCTGCCAGGGGCTTTCAATAGGTTGCTGTGTGACAGAGAGTCATGCCAAGTCACTCTCTCTGGGACTGA
电汇
AATTAGGAGCTGTACCTAAAATGAAGAGATGCCTTAAATTTCAGATTCACAATTTT
TryXML::Twig
阅读这篇关于处理SNP数据的温和介绍,我指的是30个碱基对,即CTGTGCCGTATA请向我们展示您的尝试,并解释您遇到的问题。