Python 按csv文件在我的驱动器上的显示顺序读取csv文件

Python 按csv文件在我的驱动器上的显示顺序读取csv文件,python,pandas,csv,Python,Pandas,Csv,我有一个名为Edgelist_subgraphXXX.csv的文件夹,其中XXX代表一个数字,从0到最后一个文件,例如: Edgelist_subgraph0.csv Edgelist_subgraph1.csv Edgelist_subgraph124.csv Edgelist_subgraph1156.csv Edgelist_subgraph843.csv 我需要以正确的顺序读取这些文件,并将csv中的矩阵附加到列表中。我正在做: path = r'Edgelist_subgraphs'

我有一个名为
Edgelist_subgraphXXX.csv
的文件夹,其中
XXX
代表一个数字,从0到最后一个文件,例如:

Edgelist_subgraph0.csv
Edgelist_subgraph1.csv
Edgelist_subgraph124.csv
Edgelist_subgraph1156.csv
Edgelist_subgraph843.csv
我需要以正确的顺序读取这些文件,并将csv中的矩阵附加到列表中。我正在做:

path = r'Edgelist_subgraphs' # use your path
all_files = glob.glob(path + "/*.csv")
all_files.sort()

list_of_edgeList_matrices = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    list_of_edgeList_matrices += [df]
但是我注意到文件的读取顺序错误。如果我打印
所有_文件的前几个元素
,我就会明白为什么:

['Edgelist_subgraphs/Edgelist_subgraph0.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1.csv',
 'Edgelist_subgraphs/Edgelist_subgraph10.csv',
 'Edgelist_subgraphs/Edgelist_subgraph100.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1000.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1001.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1002.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1003.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1004.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1005.csv']

这类东西乱七八糟。有没有一种快速而肮脏的方法可以正确地对这些文件进行排序,无论是在python中,还是在bash中快速重命名它们,比如在结尾处使用
0001
而不是
1

您应该将
函数传递给
排序()
,以便按数值排序,而不是按字母排序

所有文件.sort()
更改为
所有文件.sort(key=lambda x:int(x[17:-4])
17是
Edgelist_子图的len,-4是为了排除文件扩展名。
范例

输出

['Edgelist_subgraphs/Edgelist_subgraph2144.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv', 'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv', 'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv', 'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv']
或者您可以使用
os.path

from os.path import basename, splitext
print(basename('Edgelist_subgraphs/Edgelist_subgraph6307.csv'))
spam = ['Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph2144.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv'] 

spam.sort(key=lambda x:int(basename(x)[17:-4]))
print(spam)

出于某种原因,您的代码给了我ValueError:int()的无效文本以10为基数:“s/Edgelist_subgraph6307”这是因为
s/
-您还可以获得路径,而不仅仅是文件名。您需要排除路径。我的示例仅使用文件名。如果路径是常量,如果我正确计算路径len,您可以将其更改为17到36。哦,它可以使用:all_files.sort(key=lambda x:int(x[36:-4]),以及路径。谢谢。建议:首先将所有文件名提取到一个列表中,然后将其放入数据框中,使用regex添加额外的列,并对其排序。然后您可以按该顺序提取数据吗?
from os.path import basename, splitext
print(basename('Edgelist_subgraphs/Edgelist_subgraph6307.csv'))
spam = ['Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph2144.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv'] 

spam.sort(key=lambda x:int(basename(x)[17:-4]))
print(spam)