Python提取文本文件我有几个测试文件保存在一个目录中我想转到每个文件，搜索一些文本“文本1”和“文本2”，并在输出文件中打印此文本前面的所有内容这是我使用python脚本完成的但接下来的事情是，我只希望每个文件中都有“Text 1”和“Text 2”的第一个实例。如果在当前脚本中添加break，则无法打印输入输出文件_Python_File_Search_Text

Python提取文本文件我有几个测试文件保存在一个目录中我想转到每个文件，搜索一些文本“文本1”和“文本2”，并在输出文件中打印此文本前面的所有内容这是我使用python脚本完成的但接下来的事情是，我只希望每个文件中都有“Text 1”和“Text 2”的第一个实例。如果在当前脚本中添加break，则无法打印输入输出文件

python file search text

Python提取文本文件我有几个测试文件保存在一个目录中我想转到每个文件，搜索一些文本“文本1”和“文本2”，并在输出文件中打印此文本前面的所有内容这是我使用python脚本完成的但接下来的事情是，我只希望每个文件中都有“Text 1”和“Text 2”的第一个实例。如果在当前脚本中添加break，则无法打印输入输出文件,python,file,search,text,Python,File,Search,Text,请引导我。。我是一个python初学者 import os path = "D:\test" in_files = os.listdir(path) desc = open("desc.txt", "w") print >> desc, "Mol_ID, Text1, Text2" moldesc = ['Text1', 'Text2'] for f in in_files: file = os.path.join(path, f) text = open(fi

请引导我。。我是一个python初学者

import os
path = "D:\test"
in_files = os.listdir(path)
desc = open("desc.txt", "w")
print >> desc, "Mol_ID,   Text1,  Text2"
moldesc = ['Text1', 'Text2']
for f in in_files:
    file = os.path.join(path, f)
    text = open(file, "r")
    hit_count = 0
    hit_count1 = 0
    for line in text:
        if moldesc[0] in line:
            Text1 = line.split()[-1]
        if moldesc[1] in line:
            Text2 = line.split()[-1]
            print >> desc, f + "," + Text1 + "," + Text2
text.close()
print "Text extraction done !!!"

您的代码有几个问题：

您的
```
text.close（）
```
应该与
```
for line in text
```
循环处于同一级别
```
print>>desc
```
语句不合适：仅当定义了
```
Text1
```
和
```
Text2
```
时才应打印。您可以在
```
for line in text
```
循环之外将它们设置为None，并测试它们是否都不是
```
None
```
。（或者，您可以在
```
if moldesc[0]
```
测试中设置
```
hit_count0=1
```
，在
```
if moldesc[1]
```
中设置
```
hit_count1=1
```
，并测试
```
hit_count0和hit_count1
```
）。在这种情况下，打印输出并使用
```
中断
```
来退出循环

（所以，用简单的代码：）

还有第三个问题：

您提到要在

Text1

之前输入文本？然后您可能想使用

Text1=line[：line.index（moldesc[0]）]

而不是

Text1=line.split（）[-1]

。

我会选择

mmap

并可能使用CSV作为结果文件方法，类似于（未测试）和边缘粗糙。。。（需要更好的错误处理，可能需要使用mm.find（）而不是regex，一些代码是从OP等处逐字复制的，我的电脑电池快没电了……）

你想要两个或其中一个的第一次？我很难理解你的问题。您能提供一个输入和输出示例吗？为什么不使用find、xargs、grep和sed？@njzk2我没有使用grep或其他类似的命令，因为我想在上面做更多的事情…问题解决了！！！简单地关闭文件就解决了问题。。。。

for f in in_files:
    file = os.path.join(path, f)
    with open(file, "r") as text:
        hit_count = 0
        hit_count1 = 0
        for line in text:
            if moldesc[0] in line:
                Text1 = line.split()[-1]
                hit_count = 1
            if moldesc[1] in line:
                Text2 = line.split()[-1]
                hit_count1 = 1
            if hit_count and hit_count1:
                print >> desc, f + "," + Text1 + "," + Text2
                break

import os 
import csv
import mmap
from collections import defaultdict

PATH = r"D:\test"  # note 'r' prefix to escape '\t' interpretation
in_files = os.listdir(path)

fout = open('desc.txt', 'w')
csvout = csv.writer(fout)
csvout.writerow( ['Mol_ID', 'Text1', 'Text2'] )

dd = defaultdict(list)

for filename in in_files: 
    fin = open(os.path.join(path, f))
    mm = mmap.mmap(fin.fileno(), 0, access=mmap.ACCESS_READ)
    # Find stuff
    matches = re.findall(r'(.*?)(Text[12])', mm) # maybe user finditer depending on exact needs
    for text, matched in matches:
        dd[matched].append(text)
    # do something with dd - write output using csvout.writerow()...
    mm.close()
    fin.close()
csvout.close()