Python DataFrame仅打印带有BeautifulSoup的最后一行XML

Python DataFrame仅打印带有BeautifulSoup的最后一行XML,python,xml,pandas,beautifulsoup,pubmed,Python,Xml,Pandas,Beautifulsoup,Pubmed,您好,我正在尝试从pubmed XML数据集中提取一些信息。以下是我的代码的第一部分: from bs4 import BeautifulSoup as bs import pandas as pd content = [] with open("phosphiltestfilepmc.xml", "r") as file: content = file.readlines() content = "".join(co

您好,我正在尝试从pubmed XML数据集中提取一些信息。以下是我的代码的第一部分:

from bs4 import BeautifulSoup as bs
import pandas as pd

content = []
with open("phosphiltestfilepmc.xml", "r") as file:
    content = file.readlines()
    content = "".join(content)
    bs_content = bs(content, "lxml")
    available_contacts = 139
    start_list = 0
    input_tag = bs_content.find_all(attrs={'ref-type': 'corresp'})
我正在使用find_all函数返回“ref type”=“corresp”的所有属性,这将输出一个“resultset”

从那里,我循环遍历它们并获取父元素,如下所示:

    l = []
    a = []
    for i in range(start_list, available_contacts):
        d = {}
        b = {}
        try:
            d['firstname'] = input_tag[i].parent('given-names')
        except:
            None
        try:
            d['lastname'] = input_tag[i].parent('surname'))
        except:
            None
        try:
            d['email'] = input_tag[i].parent.parent.parent.parent('corresp')[0]('email')
        except:
            d['email'] = 'j@g.com'
        l.append(d)
    print(l)
打印(l)的结果是字典列表(这是一个片段):
[{'firstname':[Inn Ho],'lastname':[Tsai],'email':[bc201@gate.sinica.edu.tw]}]

我正试图从这些词典中找到课文。我认为get_text()在结果集上不起作用

我的解决方案是再次循环它们,这次使用text.strip()请参见以下内容:

        for tag, tag2, tag3, in zip(d['firstname'], d['lastname'], d['email']):
            try:
                b['First Name'] = tag.text.strip()
            except:
                None
            try:
                b['Last Name'] = tag2.text.strip()
            except:
                None
            try:
                b['Email Address'] = tag3.text.strip()
            except:
                None
            a.append(b)
    print(a)
“a”的输出是一个字典列表(这只是一个片段):
[{'First Name':'JoséMaría','Last Name':'Gutiérrez','Email Address':'jgutierr@icp.ucr.ac.cr'}]

当我尝试从“a”获取数据帧时,问题就会出现

import pandas
df = pandas.DataFrame(a)
df
输出仅是a列表中的姓氏。请帮忙

下面是一段xml代码

<?xml version="1.0" ?>
<!DOCTYPE pmc-articleset PUBLIC "-//NLM//DTD ARTICLE SET 2.0//EN" "https://dtd.nlm.nih.gov/ncbi/pmc/articleset/nlm-articleset-2.0.dtd">
<pmc-articleset><article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article">
  <?properties open_access?>
  <front>
    <journal-meta>
      <journal-id journal-id-type="nlm-ta">Braz J Med Biol Res</journal-id>
      <journal-id journal-id-type="iso-abbrev">Braz. J. Med. Biol. Res</journal-id>
      <journal-id journal-id-type="publisher-id">bjmbr</journal-id>
      <journal-title-group>
        <journal-title>Brazilian Journal of Medical and Biological Research</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0100-879X</issn>
      <issn pub-type="epub">1414-431X</issn>
      <publisher>
        <publisher-name>Associa&#xE7;&#xE3;o Brasileira de Divulga&#xE7;&#xE3;o Cient&#xED;fica</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="pmid">31721904</article-id>
      <article-id pub-id-type="pmc">6853074</article-id>
      <article-id pub-id-type="other">00606</article-id>
      <article-id pub-id-type="doi">10.1590/1414-431X20198441</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Behavioral effects of <italic>Bj</italic>-PRO-7a, a proline-rich oligopeptide from <italic>Bothrops jararaca</italic> venom</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0003-4646-5682</contrib-id>
          <name>
            <surname>Turones</surname>
            <given-names>L.C.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0002-2318-9809</contrib-id>
          <name>
            <surname>da Cruz</surname>
            <given-names>K.R.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0002-4061-8804</contrib-id>
          <name>
            <surname>Camargo-Silva</surname>
            <given-names>G.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0003-1799-1106</contrib-id>
          <name>
            <surname>Reis-Silva</surname>
            <given-names>L.L.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0002-4997-2658</contrib-id>
          <name>
            <surname>Graziani</surname>
            <given-names>D.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Ferreira</surname>
            <given-names>P.M.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0003-2836-5565</contrib-id>
          <name>
            <surname>Galdino</surname>
            <given-names>P.M.</given-names>
          </name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0003-0488-5400</contrib-id>
          <name>
            <surname>Pedrino</surname>
            <given-names>G.R.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0001-8738-5852</contrib-id>
          <name>
            <surname>Santos</surname>
            <given-names>R.</given-names>
          </name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0003-1996-0901</contrib-id>
          <name>
            <surname>Costa</surname>
            <given-names>E.A.</given-names>
          </name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0001-5709-9329</contrib-id>
          <name>
            <surname>Ianzer</surname>
            <given-names>D.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="corresp" rid="cor1">*</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">http://orcid.org/0000-0003-4006-8213</contrib-id>
          <name>
            <surname>Xavier</surname>
            <given-names>C.H.</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="corresp" rid="cor1">*</xref>
        </contrib>
        <aff id="aff1">
<label>1</label>Laborat&#xF3;rio de Neurobiologia de Sistemas, Departamento de Ci&#xEA;ncias Fisiol&#xF3;gicas, Instituto de Ci&#xEA;ncias Biol&#xF3;gicas, Universidade Federal de Goi&#xE1;s, Goi&#xE2;nia, GO, Brasil</aff>
        <aff id="aff2">
<label>2</label>Laborat&#xF3;rio de Farmacologia de Produtos Naturais e Sint&#xE9;ticos, Departamento de Farmacologia, Instituto de Ci&#xEA;ncias Biol&#xF3;gicas, Universidade Federal de Goi&#xE1;s, Goi&#xE2;nia, GO, Brasil</aff>
        <aff id="aff3">
<label>3</label>Departamento de Fisiologia e Biof&#xED;sica, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brasil</aff>
      </contrib-group>
      <author-notes>
        <corresp id="cor1">Correspondence: C.H. Xavier: &lt;<email>carlosxavier@ufg.br</email>&gt;</corresp>
        <fn fn-type="equal" id="fn1">
          <p>*These authors contributed equally to his work.</p>
        </fn>
      </author-notes>
      <pub-date pub-type="epub">
        <day>07</day>
        <month>11</month>
        <year>2019</year>
      </pub-date>
      <pub-date pub-type="collection">
        <year>2019</year>
      </pub-date>
      <volume>52</volume>
      <issue>11</issue>
      <elocation-id>e8441</elocation-id>
      <history>
        <date date-type="received">
          <day>12</day>
          <month>2</month>
          <year>2019</year>
        </date>
        <date date-type="accepted">
          <day>30</day>
          <month>8</month>
          <year>2019</year>
        </date>
      </history>
      <permissions>
        <license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
          <license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
        </license>
      </permissions>
      <abstract>
        <p>The heptapeptide <italic>Bj</italic>-PRO-7a, isolated and identified from <italic>Bothrops jararaca</italic> (<italic>Bj</italic>) venom, produces antihypertensive and other cardiovascular effects that are independent on angiotensin converting enzyme inhibition, possibly relying on cholinergic muscarinic receptors subtype 1 (M<sub>1</sub>R). However, whether <italic>Bj</italic>-PRO-7a acts upon the central nervous system and modifies behavior is yet to be determined. Therefore, the aims of this study were: i) to assess the effects of acute administration of <italic>Bj</italic>-PRO-7a upon behavior; ii) to reveal mechanisms involved in the effects of <italic>Bj</italic>-PRO-7a upon locomotion/exploration, anxiety, and depression-like behaviors. For this purpose, adult male Wistar (WT, wild type) and spontaneous hypertensive rats (SHR) received intraperitoneal injections of vehicle (0.9% NaCl), diazepam (2 mg/kg), imipramine (15 mg/kg), <italic>Bj</italic>-PRO-7a (71, 213 or 426 nmol/kg), pirenzepine (852 nmol/kg), &#x3B1;-methyl-DL-tyrosine (200 mg/kg), or chlorpromazine (2 mg/kg), and underwent elevated plus maze, open field, and forced swimming tests. The heptapeptide promoted anxiolytic and antidepressant-like effects and increased locomotion/exploration. These effects of <italic>Bj</italic>-PRO-7a seem to be dependent on M<sub>1</sub>R activation and dopaminergic receptors and rely on catecholaminergic pathways.</p>
      </abstract>
      <kwd-group>
        <kwd><italic>Bj</italic>-PRO-7a</kwd>
        <kwd>Snake venom</kwd>
        <kwd>Neuroactive compounds</kwd>
        <kwd>Anxiety</kwd>
        <kwd>Depression</kwd>
        <kwd>Behavior</kwd>
      </kwd-group>
      <counts>
        <fig-count count="9"/>
        <table-count count="0"/>
        <equation-count count="0"/>
        <ref-count count="35"/>
      </counts>
    </article-meta>
  </front>

我希望我正确理解了您的问题:您希望从
标记中提取名称,其中有
txt
包含问题的XML片段):

印刷品:

  First Name Last Name        Email Address
0         D.    Ianzer  carlosxavier@ufg.br
1       C.H.    Xavier  carlosxavier@ufg.br

我希望我正确理解了您的问题:您希望从
标记中提取名称,其中有
txt
包含问题的XML片段):

印刷品:

  First Name Last Name        Email Address
0         D.    Ianzer  carlosxavier@ufg.br
1       C.H.    Xavier  carlosxavier@ufg.br

这是一种有趣的方式,是否有可能在此基础上进行扩展?例如,如果你想循环浏览每一个标签并获取名称和电子邮件?这是一种有趣的方式,是否可以在此基础上进行扩展?例如,如果您想循环浏览每个标签并获取名称和电子邮件?
  First Name Last Name        Email Address
0         D.    Ianzer  carlosxavier@ufg.br
1       C.H.    Xavier  carlosxavier@ufg.br