我有一个文件夹,有将近7000个名为Edgelist_subgraphXXX.csv的csv文件,其中XXX代表一个数字,从0到最后一个文件,例如:
Edgelist_subgraph0.csv
Edgelist_subgraph1.csv
Edgelist_subgraph124.csv
Edgelist_subgraph1156.csv
Edgelist_subgraph843.csv我需要以正确的顺序读取这些文件,并将csv中的矩阵附加到列表中。我正在做:
path = r'Edgelist_subgraphs' # use your path
all_files = glob.glob(path + "/*.csv")
all_files.sort()
list_of_edgeList_matrices = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
list_of_edgeList_matrices += [df]然而,我注意到文件是按错误的顺序读取的。如果我打印all_files的前几个元素,我就知道为什么:
['Edgelist_subgraphs/Edgelist_subgraph0.csv',
'Edgelist_subgraphs/Edgelist_subgraph1.csv',
'Edgelist_subgraphs/Edgelist_subgraph10.csv',
'Edgelist_subgraphs/Edgelist_subgraph100.csv',
'Edgelist_subgraphs/Edgelist_subgraph1000.csv',
'Edgelist_subgraphs/Edgelist_subgraph1001.csv',
'Edgelist_subgraphs/Edgelist_subgraph1002.csv',
'Edgelist_subgraphs/Edgelist_subgraph1003.csv',
'Edgelist_subgraphs/Edgelist_subgraph1004.csv',
'Edgelist_subgraphs/Edgelist_subgraph1005.csv']这是一种完全的混乱。是否有一种快速而肮脏的方法来正确排序这些文件,无论是在python中,还是在bash中快速重命名它们,类似于0001而不是1?
发布于 2021-01-10 19:12:30
您应该将key函数传递给sort(),以便按数值进行排序,而不是按字母顺序排序。
将all_files.sort()更改为all_files.sort(key=lambda x:int(x[17:-4]) 17是Edgelist_subgraph的len,-4是为了排除文件扩展名。示例
spam = ['Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph2144.csv',
'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv',
'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv',
'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv',
'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv']
spam.sort(key=lambda x:int(x[36:-4]))
print(spam)输出
['Edgelist_subgraphs/Edgelist_subgraph2144.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv', 'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv', 'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv', 'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv']或者您可以使用os.path中的一些函数
from os.path import basename, splitext
print(basename('Edgelist_subgraphs/Edgelist_subgraph6307.csv'))
spam = ['Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph2144.csv',
'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv',
'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv',
'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv',
'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv']
spam.sort(key=lambda x:int(basename(x)[17:-4]))
print(spam)https://stackoverflow.com/questions/65657303
复制相似问题