Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 利用多行中的值提取连字符分隔的键值对_Python_Python 3.x_Optimization_Key Value_Text Parsing - Fatal编程技术网

Python 利用多行中的值提取连字符分隔的键值对

Python 利用多行中的值提取连字符分隔的键值对,python,python-3.x,optimization,key-value,text-parsing,Python,Python 3.x,Optimization,Key Value,Text Parsing,输入文本文件 LID - E164 [pii] LID - 10.3390/antiox9020164 [doi] AB - Although prickly pear fruits have become an important part of the Canary diet, their native varieties are yet to be characterized in terms of betalains and phenolic compounds

输入文本文件

LID - E164 [pii]
LID - 10.3390/antiox9020164 [doi]
AB  - Although prickly pear fruits have become an important part of the Canary diet,
      their native varieties are yet to be characterized in terms of betalains and
      phenolic compounds.
FAU - Gomez-Maqueo, Andrea
AU  - Gomez-Maqueo A
AUID- ORCID: 0000-0002-0579-1855

PG  - 1-13
LID - 10.1007/s00442-020-04624-w [doi]
AB  - Recent observational evidence suggests that nighttime temperatures are increasing
      faster than daytime temperatures, while in some regions precipitation events are 
      becoming less frequent and more intense.
CI  - (c) 2020 Production and hosting by Elsevier B.V. on behalf of Cairo University.
FAU - Farag, Mohamed A
AU  - Farag MA

PG  - 3044
LID - 10.3389/fmicb.2019.03044 [doi]
AB  - Microbial symbionts account for survival, development, fitness and evolution of
      eukaryotic hosts. These microorganisms together with their host form a biological
      unit known as holobiont.

AU  - Flores-Nunez VM
AD  - Departamento de Ingenieria Genetica, Centro de Investigacion y de Estudios
      Avanzados del Instituto Politecnico Nacional, Irapuato, Mexico.

我试图提取文本中由
AB
表示的摘要。我遍历每一行,检查键是否是抽象的键。如果是这样的话,我将设置一个标志,并附加由空格分隔的后续行。有更好的方法吗

f = "sample.txt"

abstracts = []
flag = False

with open(f) as myfile:
    for line in myfile:

        # append subsequent lines if flag is set
        if flag:
            if line.startswith("      "):
                req_line = req_line + " " + line.strip()
            else:
                abstracts.append(req_line)
                req_line = ""
                flag = False

        # find beginning of abstract
        if line.startswith("AB  - "):
            req_line = line.replace("AB  - ", "", 1)
            flag = True

输出:

[
"Although prickly pear fruits have become an important part of the Canary diet, their native varieties are yet to be characterized in terms of betalains and phenolic compounds.",
"Recent observational evidence suggests that nighttime temperatures are increasing faster than daytime temperatures, while in some regions precipitation events are becoming less frequent and more intense.",
"Microbial symbionts account for survival, development, fitness and evolution of eukaryotic hosts. These microorganisms together with their host form a biological unit known as holobiont."
]
使用regex执行此操作(假设您的输入字符串是通过
open(“file.txt”).read()读取的
s
read):

给予

使用regex执行此操作(假设您的输入字符串是通过
open(“file.txt”).read()读取的
s
read):

给予

import re
matches = re.findall("AB\W*-\W*([^-]*(?=\n))", s)
output = [" ".join(map(str.strip, i.split("\n"))) for i in matches]
['Although prickly pear fruits have become an important part of the Canary diet, their native varieties are yet to be characterized in terms of betalains and phenolic compounds.',
 'Recent observational evidence suggests that nighttime temperatures are increasing faster than daytime temperatures, while in some regions precipitation events are becoming less frequent and more intense.',
 'Microbial symbionts account for survival, development, fitness and evolution of eukaryotic hosts. These microorganisms together with their host form a biological unit known as holobiont.']