Python 如何在SBML中为基因添加注释？_Python_Cbmpy_Sbml

Python 如何在SBML中为基因添加注释？

python

Python 如何在SBML中为基因添加注释？,python,cbmpy,sbml,Python,Cbmpy,Sbml,我有一个基因组规模的化学计量代谢模型iMM904.xml，当我在文本编辑器中打开它时，我可以看到某些基因添加了注释，例如 <fbc:geneProduct fbc:id="G_YLR189C" fbc:label="YLR189C" metaid="G_YLR189C"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http

我有一个基因组规模的化学计量代谢模型

iMM904.xml

，当我在文本编辑器中打开它时，我可以看到某些基因添加了注释，例如

<fbc:geneProduct fbc:id="G_YLR189C" fbc:label="YLR189C" metaid="G_YLR189C">
<annotation>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
    <rdf:Description rdf:about="#G_YLR189C">
      <bqbiol:isEncodedBy>
        <rdf:Bag>
          <rdf:li rdf:resource="http://identifiers.org/ncbigene/850886" />
          <rdf:li rdf:resource="http://identifiers.org/sgd/S000004179" />
        </rdf:Bag>
      </bqbiol:isEncodedBy>
    </rdf:Description>
  </rdf:RDF>
</annotation>
</fbc:geneProduct>

我只看到一本空字典

此外，如何添加注释（如上次修改的

）和实际注释？
在CBMPy中，有三种不同的选项可将注释添加到SBML文件：
1） 米里亚姆注释
2） 任意键值对和
3） 可读笔记
这应该涵盖你在问题中提到的所有要点。我演示了如何将它们用于基因输入，但同样的命令也可用于注释物种（代谢物）和反应
1。MIRIAM注释
要访问现有的MIRIAM注释（您在问题中显示的注释），您可以使用：
import cbmpy as cbm

mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')

# access gene directly by its locus tag which avoids dealing with the "G_" in the ID
gene = mod.getGeneByLabel('YLR189C')

gene.getMIRIAMannotations()

gene.setNotes('This is my favorite gene')

这将提供：
{'encodes': (),
 'hasPart': (),
 'hasProperty': (),
 'hasTaxon': (),
 'hasVersion': (),
 'is': (),
 'isDerivedFrom': (),
 'isDescribedBy': (),
 'isEncodedBy': ('http://identifiers.org/ncbigene/850886',
  'http://identifiers.org/sgd/S000004179'),
 'isHomologTo': (),
 'isPartOf': (),
 'isPropertyOf': (),
 'isVersionOf': (),
 'occursIn': ()}

如您所见，它包含您在SBML文件中看到的条目
如果现在要添加MIRIAM注释，可以使用两种方法：
A）让CBMPy为您创建url：
gene.addMIRIAMannotation('is', 'UniProt Knowledgebase', 'Q06321')

B）输入您自己的url：
# made up protein!
gene.addMIRIAMuri('is', 'http://identifiers.org/uniprot/P12345')

如果现在选中gene.getMIRIAMannotations（）
，您将看到（我删除了一些空条目）：
因此，您的两个条目都已添加（再次说明：P12345
条目仅用于演示，不要在实际模型中使用它！）
如果您不知道正确的数据库标识符，CBMPy也会帮助您，例如，如果您尝试：
gene.addMIRIAMannotation('is', 'uniprot', 'Q06321')

它会打印出来
"uniprot" is not a valid entity were you looking for one of these:

    UNII
    UniGene
    UniParc
    UniPathway Compound
    UniPathway Reaction
    UniProt Isoform
    UniProt Knowledgebase
    UniSTS
    Unimod
    Unipathway
    Unit Ontology
    Unite
INFO: Invalid entity: "uniprot" MIRIAM entity NOT set

其中包含我们上面使用的“UniProt知识库”

2。添加任意键值对。
并非所有内容都可以使用MIRIAM注释方案进行注释，但您可以轻松创建自己的键值对
。以你为例,
gene.setAnnotation('last_modified_by', 'Vinz')

键和值是完全任意的
gene.setAnnotation('arbitrary key', 'arbitrary value')

如果你现在打电话
gene.getAnnotations()

你收到
{'arbitrary key': 'arbitrary value', 'last_modified_by': 'Vinz'}

如果要访问某个密钥，可以使用
gene.getAnnotation('last_modified_by')

产生
'Vinz'

3。添加注释
如果您想写实际评论，前两个选项都不合适，但您可以使用：
import cbmpy as cbm

mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')

# access gene directly by its locus tag which avoids dealing with the "G_" in the ID
gene = mod.getGeneByLabel('YLR189C')

gene.getMIRIAMannotations()

gene.setNotes('This is my favorite gene')

您可以使用
gene.getNotes()

如果现在使用导出模型（请确保使用FBCV2！）：
然后在文本编辑器中打开模型，您将看到所有注释都已添加到：
<fbc:geneProduct metaid="meta_G_YLR189C" fbc:id="G_YLR189C" fbc:label="YLR189C">
  <notes>
    <html:body>This is my favorite gene</html:body>
  </notes>
  <annotation>
    <listOfKeyValueData xmlns="http://pysces.sourceforge.net/KeyValueData">
      <data id="arbitrary key" value="arbitrary value"/>
      <data id="last_modified_by" value="Vinz"/>
    </listOfKeyValueData>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
      <rdf:Description rdf:about="#meta_G_YLR189C">
        <bqbiol:is>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/uniprot/Q06321"/>
            <rdf:li rdf:resource="http://identifiers.org/uniprot/P12345"/>
          </rdf:Bag>
        </bqbiol:is>
        <bqbiol:isEncodedBy>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/ncbigene/850886"/>
            <rdf:li rdf:resource="http://identifiers.org/sgd/S000004179"/>
          </rdf:Bag>
        </bqbiol:isEncodedBy>
      </rdf:Description>
    </rdf:RDF>
  </annotation>
</fbc:geneProduct>


这是我最喜欢的基因

<fbc:geneProduct metaid="meta_G_YLR189C" fbc:id="G_YLR189C" fbc:label="YLR189C">
  <notes>
    <html:body>This is my favorite gene</html:body>
  </notes>
  <annotation>
    <listOfKeyValueData xmlns="http://pysces.sourceforge.net/KeyValueData">
      <data id="arbitrary key" value="arbitrary value"/>
      <data id="last_modified_by" value="Vinz"/>
    </listOfKeyValueData>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
      <rdf:Description rdf:about="#meta_G_YLR189C">
        <bqbiol:is>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/uniprot/Q06321"/>
            <rdf:li rdf:resource="http://identifiers.org/uniprot/P12345"/>
          </rdf:Bag>
        </bqbiol:is>
        <bqbiol:isEncodedBy>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/ncbigene/850886"/>
            <rdf:li rdf:resource="http://identifiers.org/sgd/S000004179"/>
          </rdf:Bag>
        </bqbiol:isEncodedBy>
      </rdf:Description>
    </rdf:RDF>
  </annotation>
</fbc:geneProduct>