文章/答案/技术大牛

发布

社区首页 >问答首页 >处理速度-编辑大型2GB文本文件python

问处理速度-编辑大型2GB文本文件python
EN

Stack Overflow用户

提问于 2013-10-20 18:03:39

回答 4查看 1.5K关注 0票数 2

所以我有个问题。我正在处理由4行多行组成的.txt文件。我在python 3工作。

我编写了一段代码，意思是将文本文件的第2行和第4行都取下来，只保留这两行的前20个字符(同时保留第1行和第3行未编辑)，并创建一个新编辑的文件，包括编辑的第2行和第4行以及未经编辑的第1和第3行。这一趋势对于每一行都是一样的，因为我处理的所有文本文件的行号都是4的倍数。

这适用于小文件(大约100行)，但我需要编辑的文件是50 million+行，而且需要4+小时。

下面是我的密码。有人能给我一个关于如何加快我的计划的建议吗？谢谢!

import io
import os
import sys

newData = ""
i=0
run=0
j=0
k=1
m=2
n=3
seqFile = open('temp100.txt', 'r')
seqData = seqFile.readlines()
while i < 14371315:
    sLine1 = seqData[j] 
    editLine2 = seqData[k]
    sLine3 = seqData[m]
    editLine4 = seqData[n]
    tempLine1 = editLine2[0:20]
    tempLine2 = editLine4[0:20]
    newLine1 = editLine2.replace(editLine2, tempLine1)
    newLine2 = editLine4.replace(editLine4, tempLine2)
    newData = newData + sLine1 + newLine1 + '\n' + sLine3 + newLine2
    if len(seqData[k]) > 20:
         newData += '\n'
    i=i+1
    run=run+1
    j=j+4
    k=k+4
    m=m+4
    n=n+4
    print(run)

seqFile.close()

new = open("new_100temp.txt", "w")
sys.stdout = new
print(newData)

performance

text

python

回答 4

Stack Overflow用户

回答已采纳

发布于 2013-10-20 18:25:49

如果一次只读4行并处理这些(未经测试)，它可能要快得多：

with open('100temp.txt') as in_file, open('new_100temp.txt', 'w') as out_file:
    for line1, line2, line3, line4 in grouper(in_file, 4):
         # modify 4 lines
         out_file.writelines([line1, line2, line3, line4])

其中，grouper(it, n)是一个函数，它一次生成迭代标记it的n项。它作为itertools模块的一个itertools给出(参见SO中的这条 )。以这种方式迭代文件类似于对文件调用readlines()，然后手动迭代生成的列表，但一次只将几行代码读入内存。

票数 2

Stack Overflow用户

发布于 2013-10-20 18:20:15

您正在处理内存中的两个文件(输入和输出)。如果文件很大(分页)，则会导致时间问题。尝试(伪码)

Open input file for read
Open output file for write
Initialize counter to 1
While not EOF in input file
    Read input line
    If counter is odd 
        Write line to output file
    Else
        Write 20 first characters of line to output file
    Increment counter
Close files

票数 2

Stack Overflow用户

发布于 2013-10-20 18:22:21

这里最大的问题似乎是立即读取整个文件：

seqData = seqFile.readlines()

相反，您应该首先打开源文件和输出文件。然后遍历第一个文件并按您的意愿操作行：

outfile = open('output.txt', 'w')
infile = open('input.txt', 'r')

i = 0
for line in infile:
    if i % 2 == 0:
       newline = line
    else:
       newline = line[:20]

    outfile.write( newline )
    i += 1

outfile.close()
infile.close()

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/19480902

复制

相似问题

问处理速度-编辑大型2GB文本文件python
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问处理速度-编辑大型2GB文本文件pythonEN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问处理速度-编辑大型2GB文本文件python
EN