用Python从txt文件中提取行_Python

用Python从txt文件中提取行

python

用Python从txt文件中提取行,python,Python,我正在从NCBI下载mtDNA记录，并尝试使用Python从中提取行。我试图提取的行要么以“单倍型”和“国籍”或“地点”开头，要么包含某些关键字。我尝试了以下代码： import re infile = open('sequence.txt', 'r') #open in file 'infileName' to read outfile = open('results.txt', 'a') #open out file 'outfileName' to write for line in i

我正在从NCBI下载mtDNA记录，并尝试使用Python从中提取行。我试图提取的行要么以“单倍型”和“国籍”或“地点”开头，要么包含某些关键字。我尝试了以下代码：

import re
infile = open('sequence.txt', 'r') #open in file 'infileName' to read
outfile = open('results.txt', 'a') #open out file 'outfileName' to write

for line in infile:
    if re.findall("(.*)haplogroup(.*)", line):
        outfile.write(line)
        outfile.write(infile.readline())

infile.close()
outfile.close()

此处的输出仅包含包含“haplogroup”的第一行，例如，不包含内嵌的下一行：

                 /haplogroup="T2b20"

我还尝试了以下方法：

keep_phrases = ["ACCESSION", "haplogroup"]

for line in infile:
    for phrase in keep_phrases:
        if phrase in line:
            outfile.write(line)
            outfile.write(infile.readline())

但这并没有给我所有包含加入和单倍体的行

<代码>行。StastsWork，但我不能用这个词来表示行的中线。

有谁能给我一段示例代码，将以下行打印到我的输出中，以包含“locality”：

/note="origin_locality:Wales"

对于如何提取包含特定单词的行的任何其他建议，我也非常感谢

编辑：

在这种情况下，使用Peter的代码，前三行写入输出文件，但不写入包含

21.05 E“

”的行。我如何对

/note=“

进行例外处理，并将所有行复制到第二组引号中，而不复制包含

/note>/note>的/code>行=“TAA
或/note=“密码子

编辑2：
这是我目前的解决方案，对我有效
stuff_to_write = []
multiline = False
with open('sequences.txt') as f:
    for line in f.readlines():
        if any(phrase in line for phrase in keep_phrases) or multiline:
            do_not_write = False
            if multiline and line.count('"') >= 1:
                multiline = False
            if 'note' in line:
                if any(phrase in line.split('note')[1] for phrase in remove_phrases):
                    do_not_write = True
                elif line.count('"') < 2:
                    multiline = True
            if not do_not_write:
                stuff_to_write.append(line)

stuff\u to\u write=[]
多行=假
将open（'sequences.txt'）作为f：
对于f.readlines（）中的行：
如果有（保留中短语的行中短语）或多行：
请勿写入=错误
如果多行和行数（“”）>=1：
多行=假
如果行中有“注释”：
如有（删除短语中短语的第行拆分（'note'）[1]中的短语）：
不写=真
elif行数（“”）<2：
多行=真
如果没有，请写：
填充到写入。追加（行）
这将搜索文件中匹配的短语，并将这些行写入新文件，假设之后的任何内容“注意”
与删除短语中的任何内容都不匹配
它将逐行读取输入，检查是否有任何内容与keep_phrases
中的单词匹配，将所有值存储在一个列表中，然后将它们分别写入一个新文件。除非在找到匹配项时需要逐行写入新文件，否则这种方式应该会快得多，因为所有内容都是同时写入的
如果不想区分大小写，请将any（第行中的短语
更改为any（第行中的短语.lower（）
）
对于初学者，使用with open（）
访问文件，如果您上传了一点文本文件，这样人们就可以找到工作代码，这会很有帮助：）为什么在outfile.write（infle.readline（））
之后有一行outfile.write（line）
？您是否试图在匹配后写出下一行？在文件上循环时调用readline
会导致问题。我甚至无法在2.7中运行此操作，但该行上会出现ValueError
。@Peter我为什么要使用'with open（）'打开该文件？我想我给出的两个例子已经足够了，因为文本文件更相似。@EricAppelt我把outfile.write（infle.readline（））
放在了一个类似问题的答案中，但你的解释很有意义。谢谢。：）当你将与open（）一起使用时
，它确保无论发生什么情况，文件都会被关闭，与您的代码一样，如果在执行过程中由于任何原因失败，文件也不会被关闭：P此外，由于文件缩进，您不需要使用file.close（）
所以看起来有点整洁这很有效！：）我需要将“note”作为一个词，但我想排除“note”后面跟“codon”或“TAA”这样的任何行：/note=“TAA stop codon是通过添加3个”a/note=“codon recognized:AGY”来完成的我知道我需要做一些复合语句来包含这些异常，但我不知道在不破坏它的情况下将其放在何处或如何放进去。/note
仍然让我感到悲伤。我编辑了我原来的问题并解释了问题。你能帮忙吗？嗯，这有点难理解哈哈，你想在/note之后保留语音标记中包含的内容，只要后面没有这两个短语？我可能会做类似于你的第二点，我想，我曾经尝试过类似的东西，但效果不太好：）
stuff_to_write = []
multiline = False
with open('sequences.txt') as f:
    for line in f.readlines():
        if any(phrase in line for phrase in keep_phrases) or multiline:
            do_not_write = False
            if multiline and line.count('"') >= 1:
                multiline = False
            if 'note' in line:
                if any(phrase in line.split('note')[1] for phrase in remove_phrases):
                    do_not_write = True
                elif line.count('"') < 2:
                    multiline = True
            if not do_not_write:
                stuff_to_write.append(line)

keep_phrases = ["ACCESSION", "haplogroup", "locality"]
remove_phrases = ['codon', 'TAA']

stuff_to_write = []
with open('C:/a.txt') as f:
    for line in f.readlines():
        if any(phrase in line for phrase in keep_phrases):
            do_not_write = False
            if 'note' in line:
                if any(phrase in line.split('note')[1] for phrase in remove_phrases):
                    do_not_write = True
            if not do_not_write:
                stuff_to_write.append(line)

with open('C:/b.txt','w') as f:
    f.write('\r\n'.join(stuff_to_write))