Python 使用open（）内部循环理解-获取目录中所有文件的文本内容列表_Python_Pandas

Python 使用open（）内部循环理解-获取目录中所有文件的文本内容列表

python pandas

Python 使用open（）内部循环理解-获取目录中所有文件的文本内容列表,python,pandas,Python,Pandas,在for循环（即在许多文件上运行的循环理解）中，是否有更好的方式使用带有open（file）as f:f.read（）机制的我试图将其放入一个数据框中，这样就有了从文件到文件内容的映射以下是我所拥有的——但它似乎效率低下，而且不具pythonic/可读性： documents = pd.DataFrame(glob.glob('*.txt'), columns = ['files']) documents['text'] = [np.nan]*len(documents) for txtfi

在for循环（即在许多文件上运行的循环理解）中，是否有更好的方式使用带有open（file）as f:f.read（）机制的


我试图将其放入一个数据框中，这样就有了从文件到文件内容的映射
以下是我所拥有的——但它似乎效率低下，而且不具pythonic/可读性：
documents = pd.DataFrame(glob.glob('*.txt'), columns = ['files'])
documents['text'] = [np.nan]*len(documents)
for txtfile in documents['files'].tolist():
    if txtfile.startswith('GSE'):
        with open(txtfile) as f:
            documents['text'][documents['files']==txtfile] = f.read()

输出：
    files   text
0   GSE2640_GSM50721.txt    | RNA was extracted from lung tissue using a T...
1   GSE7002_GSM159771.txt   Array Type : Rat230_2 ; Amount to Core : 15 ; ...
2   GSE1560_GSM26799.txt    | C3H denotes C3H / HeJ mice whereas C57 denot...
3   GSE2171_GSM39147.txt    | HIV seropositive , samples used to test HIV ...

您的代码看起来非常可读。
也许您正在寻找类似的东西（仅限Python 3）：
你可以做：
# import libraries
import os,pandas

# list filenames, assuming your path is './'
files = [i for i in os.listdir('./') if i[:3]=='GSE' and i[-3:]=='txt']

# get contents of files
contents = []
for i in files:
    with open(i) as f: contents.append(f.read().strip())

# into a nice table 
table = pandas.DataFrame(contents, index=files, columns=['text'])

就可读性而言，我认为这里没有问题。虽然我会在文件、类或函数的顶部添加一些文档注释，但您试图实现的目标似乎相当清楚；用可读的人类语言表示所需的功能。至于效率，我不确定有没有更好的方法：我还没有做那项研究。然而，教授们和更有经验的程序员告诉我不要对@DavidCulbreth进行预优化，或者可能重复@DavidCulbreth，我主要是想看看是否有非常简单的东西（就像python通常所做的那样），比如{file:file.readstr（）for file in filelist}
Ah。这是有道理的。考虑到您必须在使用中打开（…）

，

glob（…）

，以及

DataFrame（）

，我认为一行代码在仍然可读的情况下是不可能实现的。如果确实存在这样的符号，那么这种经过深思熟虑的符号很可能更具可读性。因为你要经历4次？不同类型的结构，我认为您已经介绍的简单方法很可能是最具可读性的，并且可能不会比初始算法快/慢。这是完美的python。它有什么不符合Pythonic的？使用常见的python习语，我觉得非常可读。您应该始终使用

with

语句来处理文件。这很像Python，它不使用上下文管理器。此外，将lambda指定给名称也违反了PEP8样式的指导原则。只需使用完整的函数定义。谢谢@juanpa.arrivillaga。如果将lambda指定给名称，会发生什么情况？你也让我更好地理解了这个问题，我相应地改变了我的建议。

# import libraries
import os,pandas

# list filenames, assuming your path is './'
files = [i for i in os.listdir('./') if i[:3]=='GSE' and i[-3:]=='txt']

# get contents of files
contents = []
for i in files:
    with open(i) as f: contents.append(f.read().strip())

# into a nice table 
table = pandas.DataFrame(contents, index=files, columns=['text'])