首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >某种方式读取文本文件的反向顺序,逐行?

某种方式读取文本文件的反向顺序,逐行?
EN

Stack Overflow用户
提问于 2019-02-28 18:25:30
回答 3查看 334关注 0票数 0

我想阅读下面给出的文本文件的相反方向,逐行。我不想使用readlines()read()

a.txt

代码语言:javascript
复制
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:

预期结果:

代码语言:javascript
复制
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr

我的解决方案:

代码语言:javascript
复制
with open('a.txt') as lines:
    for line in reversed(lines):
        print(line)
EN

回答 3

Stack Overflow用户

发布于 2019-02-28 19:07:45

这里有一种方法,可以不同时将整个文件读入内存。它确实需要首先读取整个文件,但只需要存储每行开始的位置。一旦知道了这一点,它就可以使用seek()方法以任意顺序随机访问每个方法。

下面是一个使用输入文件的示例:

代码语言:javascript
复制
# Preprocess - read whole file and note where lines start.
# (Needs to be done in binary mode.)
with open('text_file.txt', 'rb') as file:
    offsets = [0]  # First line is always at offset 0.
    for line in file:
        offsets.append(file.tell())  # Append where *next* line would start.

# Now reread lines in file in reverse order.
with open('text_file.txt', 'rb') as file:
    for index in reversed(range(len(offsets)-1)):
        file.seek(offsets[index])
        size = offsets[index+1] - offsets[index]  # Difference with next.
        # Read bytes, convert them to a string, and remove whitespace at end.
        line = file.read(size).decode().rstrip()
        print(line)

输出:

代码语言:javascript
复制
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr

更新

这里有一个版本,它做同样的事情,但是使用Python的mmap模块来记忆映射文件,这个文件应该通过利用操作系统/硬件的虚拟内存功能来提供更好的性能。

这是因为,正如PyMOTW-3所说:

内存-映射通常会提高I/O性能,因为它不涉及每个访问的单独的系统调用,并且不需要在缓冲区之间复制数据--内存由内核和用户应用程序直接访问。

代码:

代码语言:javascript
复制
import mmap

with open('text_file.txt', 'rb') as file:
    with mmap.mmap(file.fileno(), length=0, access=mmap.ACCESS_READ) as mm_file:

        # First preprocess the file and note where lines start.
        # (Needs to be done in binary mode.)
        offsets = [0]  # First line is always at offset 0.
        for line in iter(mm_file.readline, b""):
            offsets.append(mm_file.tell())  # Append where *next* line would start.

        # Now process the lines in file in reverse order.
        for index in reversed(range(len(offsets)-1)):
            mm_file.seek(offsets[index])
            size = offsets[index+1] - offsets[index]  # Difference with next.
            # Read bytes, convert them to a string, and remove whitespace at end.
            line = mm_file.read(size).decode().rstrip()
            print(line)
票数 4
EN

Stack Overflow用户

发布于 2019-02-28 18:31:37

没有比这更好的方法了。根据定义,文件是一些基本数据类型的顺序组织。文本文件的类型是字符。您正在尝试将不同的组织强加到文件中,字符串由换行符分隔。

因此,您必须完成读取文件的工作,将文件重新转换成所需的格式,然后以相反的顺序接受该组织。例如,如果你需要这么多次.将文件作为行读取,将行存储为数据库记录,然后按您认为合适的方式迭代这些记录。

file接口只向一个方向读取。您可以seek()到另一个位置,但是标准的I/O操作只在增加位置描述的情况下工作。

要使解决方案正常工作,需要读取整个文件--不能reverse文件描述符的隐式迭代器。

票数 2
EN

Stack Overflow用户

发布于 2019-02-28 21:30:09

Whlie @martineau的解决方案在不将整个文件加载到内存的情况下完成了工作,但是它仍然浪费地读取整个文件两次。

一种可以说更有效的一次传递方法是以相当大的块从文件的末尾读取到缓冲区中,从缓冲区的末尾查找下一个换行符(减去最后一个字符中的尾换行符),如果找不到,则查找并继续读取块,并将块放在缓冲区的前面,直到找到换行符为止。使用更大的块大小来进行更有效的读取,只要它在内存限制内:

代码语言:javascript
复制
class ReversedTextReader:
    def __init__(self, file, chunk_size=50):
        self.file = file
        file.seek(0, 2)
        self.position = file.tell()
        self.chunk_size = chunk_size
        self.buffer = ''

    def __iter__(self):
        return self

    def __next__(self):
        if not self.position and not self.buffer:
            raise StopIteration
        chunk = self.buffer
        while True:
            line_start = chunk.rfind('\n', 0, len(chunk) - 1 - (chunk is self.buffer))
            if line_start != -1:
                break
            chunk_size = min(self.chunk_size, self.position)
            self.position -= chunk_size
            self.file.seek(self.position)
            chunk = self.file.read(chunk_size)
            if not chunk:
                line = self.buffer
                self.buffer = ''
                return line
            self.buffer = chunk + self.buffer
        line_start += 1
        line = self.buffer[line_start:]
        self.buffer = self.buffer[:line_start]
        return line

因此:

代码语言:javascript
复制
from io import StringIO

f = StringIO('''2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
''')

for line in ReversedTextReader(f):
    print(line, end='')

产出:

代码语言:javascript
复制
2018/03/26-15:08:51.066968    1     7FE9BDC91700     std:ZMD:
2018/03/26-10:08:51.066967    0     7FE9BDC91700     Exit Status = 0x0
2018/03/26-00:08:50.981908 1389     7FE9BDC2B707     user 7fb31ecfa700
2018/03/25-24:08:50.980519  16K     7FE9BD1AF707     user: number is 93823004
2018/03/25-20:08:50.486601 1.5M     7FE9D3D41706     qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K     7FE9D2D51706     ahelooa afoaona woom
2018/03/25-00:08:48.638553  508     7FF4A8F3D704     snononsonfvnosnovoosr
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54932001

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档