Perl 从XML文件打印特定SNP等位基因的详细信息

Perl 从XML文件打印特定SNP等位基因的详细信息,perl,Perl,我是新的perl用户,我在一个文件中有以下几行 <Ss ssId="76536062" handle="AFFY" batchId="52074" locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward" strand="bottom" molType="genomic" buildId="130" methodClass="hybridize" validated="by-sub

我是新的perl用户,我在一个文件中有以下几行

<Ss ssId="76536062" handle="AFFY" batchId="52074"
    locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward" 
    strand="bottom" molType="genomic" buildId="130" methodClass="hybridize" 
    validated="by-submitter">
    <Sequence>
        <Seq5>TCACCTCTGGGACTGA</Seq5>
        <Observed>C/T</Observed>
        <Seq3>AATTAGGAAGAGCTGG</Seq3>
    </Sequence>
</Ss>
<Ss ssId="104807776" handle="KRIBB_YJKIM" batchId="60510"
    locSnpId="KHS1200112" subSnpClass="snp" orient="forward" strand="bottom"
    molType="genomic" buildId="130" methodClass="hybridize"
    validated="by-submitter">
    <Sequence>
        <Seq5>
             TAGGAACAAGGTACATTCGCGGGATAAATGTGGCCAAGTTTTATCTGCTGCCAGGGCTTTCAAATAGGTTGACCTGACAATGGGTCACCTCTGGGACTGA
       </Seq5>
       <Observed>C/T</Observed>
       <Seq3>
          AATTAGGAAGAGCTGGTACCTAAAATGAAAGATGCCCTTAAATTTCAGATTCACAATTTT
       </Seq3>
   </Sequence>
</Ss>

TCACTCTGGGACTGA
电汇
aattagaagagctgg
TAGGAGAGTACATCGCGGGATAATGGCCAAGTTATCTGCTGCCAGGGGCTTTCAATAGGTTGCTGTGTGACAGAGAGTCATGCCAAGTCACTCTCTCTGGGACTGA
电汇
AATTAGGAGCTGTACCTAAAATGAAGAGATGCCTTAAATTTCAGATTCACAATTTT

我想打印两个观察到的内容,即C/T。另外,我想打印seq5和seq3的30个bp。
提前谢谢。任何能帮助我的想法,像这样的想法都可以:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new()->parse( \*DATA );
foreach my $sequence ( $twig->get_xpath('//Sequence') ) {
    foreach my $element ( $sequence->children ) {
        print $element ->tag, " => ", $element->trimmed_text, "\n";
    }
}

__DATA__
<root>
<Ss ssId="76536062" handle="AFFY" batchId="52074"
    locSnpId="AFFY_6_1M_SNP_A-8397107" subSnpClass="snp" orient="forward" 
    strand="bottom" molType="genomic" buildId="130" methodClass="hybridize" 
    validated="by-submitter">
    <Sequence>
        <Seq5>TCACCTCTGGGACTGA</Seq5>
        <Observed>C/T</Observed>
        <Seq3>AATTAGGAAGAGCTGG</Seq3>
    </Sequence>
</Ss>
<Ss ssId="104807776" handle="KRIBB_YJKIM" batchId="60510"
    locSnpId="KHS1200112" subSnpClass="snp" orient="forward" strand="bottom"
    molType="genomic" buildId="130" methodClass="hybridize"
    validated="by-submitter">
    <Sequence>
        <Seq5>
             TAGGAACAAGGTACATTCGCGGGATAAATGTGGCCAAGTTTTATCTGCTGCCAGGGCTTTCAAATAGGTTGACCTGACAATGGGTCACCTCTGGGACTGA
       </Seq5>
       <Observed>C/T</Observed>
       <Seq3>
          AATTAGGAAGAGCTGGTACCTAAAATGAAAGATGCCCTTAAATTTCAGATTCACAATTTT
       </Seq3>
   </Sequence>
</Ss>
</root>
#/usr/bin/env perl
严格使用;
使用警告;
使用XML::Twig;
my$twig=XML::twig->new()->解析(\*数据);
foreach my$sequence($twig->get_xpath('//sequence')){
foreach my$元素($sequence->children){
打印$element->tag,“=>”,$element->trimmed_text,“\n”;
}
}
__资料__
TCACTCTGGGACTGA
电汇
aattagaagagctgg
TAGGAGAGTACATCGCGGGATAATGGCCAAGTTATCTGCTGCCAGGGGCTTTCAATAGGTTGCTGTGTGACAGAGAGTCATGCCAAGTCACTCTCTCTGGGACTGA
电汇
AATTAGGAGCTGTACCTAAAATGAAGAGATGCCTTAAATTTCAGATTCACAATTTT

Try
XML::Twig
阅读这篇关于处理SNP数据的温和介绍,我指的是30个碱基对,即CTGTGCCGTATA请向我们展示您的尝试,并解释您遇到的问题。