首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >按csv文件在我的驱动器上出现的顺序读取它们

按csv文件在我的驱动器上出现的顺序读取它们
EN

Stack Overflow用户
提问于 2021-01-10 19:00:55
回答 1查看 276关注 0票数 0

我有一个文件夹,有将近7000个名为Edgelist_subgraphXXX.csv的csv文件,其中XXX代表一个数字,从0到最后一个文件,例如:

代码语言:javascript
复制
Edgelist_subgraph0.csv
Edgelist_subgraph1.csv
Edgelist_subgraph124.csv
Edgelist_subgraph1156.csv
Edgelist_subgraph843.csv

我需要以正确的顺序读取这些文件,并将csv中的矩阵附加到列表中。我正在做:

代码语言:javascript
复制
path = r'Edgelist_subgraphs' # use your path
all_files = glob.glob(path + "/*.csv")
all_files.sort()

list_of_edgeList_matrices = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    list_of_edgeList_matrices += [df]

然而,我注意到文件是按错误的顺序读取的。如果我打印all_files的前几个元素,我就知道为什么:

代码语言:javascript
复制
['Edgelist_subgraphs/Edgelist_subgraph0.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1.csv',
 'Edgelist_subgraphs/Edgelist_subgraph10.csv',
 'Edgelist_subgraphs/Edgelist_subgraph100.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1000.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1001.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1002.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1003.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1004.csv',
 'Edgelist_subgraphs/Edgelist_subgraph1005.csv']

这是一种完全的混乱。是否有一种快速而肮脏的方法来正确排序这些文件,无论是在python中,还是在bash中快速重命名它们,类似于0001而不是1

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-01-10 19:12:30

您应该将key函数传递给sort(),以便按数值进行排序,而不是按字母顺序排序。

all_files.sort()更改为all_files.sort(key=lambda x:int(x[17:-4]) 17是Edgelist_subgraph的len,-4是为了排除文件扩展名。示例

代码语言:javascript
复制
spam = ['Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph2144.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv'] 

spam.sort(key=lambda x:int(x[36:-4]))
print(spam)

输出

代码语言:javascript
复制
['Edgelist_subgraphs/Edgelist_subgraph2144.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv', 'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv', 'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv', 'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv']

或者您可以使用os.path中的一些函数

代码语言:javascript
复制
from os.path import basename, splitext
print(basename('Edgelist_subgraphs/Edgelist_subgraph6307.csv'))
spam = ['Edgelist_subgraphs/Edgelist_subgraph6307.csv', 'Edgelist_subgraphs/Edgelist_subgraph2144.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3396.csv', 'Edgelist_subgraphs/Edgelist_subgraph6475.csv',
        'Edgelist_subgraphs/Edgelist_subgraph3157.csv', 'Edgelist_subgraphs/Edgelist_subgraph3345.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph5739.csv', 'Edgelist_subgraphs/Edgelist_subgraph3957.csv', 
        'Edgelist_subgraphs/Edgelist_subgraph3938.csv', 'Edgelist_subgraphs/Edgelist_subgraph2349.csv'] 

spam.sort(key=lambda x:int(basename(x)[17:-4]))
print(spam)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65657303

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档