我想阅读下面给出的文本文件的相反方向,逐行。我不想使用readlines()或read()。
a.txt
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:预期结果:
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr我的解决方案:
with open('a.txt') as lines:
for line in reversed(lines):
print(line)发布于 2019-02-28 19:07:45
这里有一种方法,可以不同时将整个文件读入内存。它确实需要首先读取整个文件,但只需要存储每行开始的位置。一旦知道了这一点,它就可以使用seek()方法以任意顺序随机访问每个方法。
下面是一个使用输入文件的示例:
# Preprocess - read whole file and note where lines start.
# (Needs to be done in binary mode.)
with open('text_file.txt', 'rb') as file:
offsets = [0] # First line is always at offset 0.
for line in file:
offsets.append(file.tell()) # Append where *next* line would start.
# Now reread lines in file in reverse order.
with open('text_file.txt', 'rb') as file:
for index in reversed(range(len(offsets)-1)):
file.seek(offsets[index])
size = offsets[index+1] - offsets[index] # Difference with next.
# Read bytes, convert them to a string, and remove whitespace at end.
line = file.read(size).decode().rstrip()
print(line)输出:
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr更新
这里有一个版本,它做同样的事情,但是使用Python的mmap模块来记忆映射文件,这个文件应该通过利用操作系统/硬件的虚拟内存功能来提供更好的性能。
这是因为,正如PyMOTW-3所说:
内存-映射通常会提高I/O性能,因为它不涉及每个访问的单独的系统调用,并且不需要在缓冲区之间复制数据--内存由内核和用户应用程序直接访问。
代码:
import mmap
with open('text_file.txt', 'rb') as file:
with mmap.mmap(file.fileno(), length=0, access=mmap.ACCESS_READ) as mm_file:
# First preprocess the file and note where lines start.
# (Needs to be done in binary mode.)
offsets = [0] # First line is always at offset 0.
for line in iter(mm_file.readline, b""):
offsets.append(mm_file.tell()) # Append where *next* line would start.
# Now process the lines in file in reverse order.
for index in reversed(range(len(offsets)-1)):
mm_file.seek(offsets[index])
size = offsets[index+1] - offsets[index] # Difference with next.
# Read bytes, convert them to a string, and remove whitespace at end.
line = mm_file.read(size).decode().rstrip()
print(line)发布于 2019-02-28 18:31:37
没有比这更好的方法了。根据定义,文件是一些基本数据类型的顺序组织。文本文件的类型是字符。您正在尝试将不同的组织强加到文件中,字符串由换行符分隔。
因此,您必须完成读取文件的工作,将文件重新转换成所需的格式,然后以相反的顺序接受该组织。例如,如果你需要这么多次.将文件作为行读取,将行存储为数据库记录,然后按您认为合适的方式迭代这些记录。
file接口只向一个方向读取。您可以seek()到另一个位置,但是标准的I/O操作只在增加位置描述的情况下工作。
要使解决方案正常工作,需要读取整个文件--不能reverse文件描述符的隐式迭代器。
发布于 2019-02-28 21:30:09
Whlie @martineau的解决方案在不将整个文件加载到内存的情况下完成了工作,但是它仍然浪费地读取整个文件两次。
一种可以说更有效的一次传递方法是以相当大的块从文件的末尾读取到缓冲区中,从缓冲区的末尾查找下一个换行符(减去最后一个字符中的尾换行符),如果找不到,则查找并继续读取块,并将块放在缓冲区的前面,直到找到换行符为止。使用更大的块大小来进行更有效的读取,只要它在内存限制内:
class ReversedTextReader:
def __init__(self, file, chunk_size=50):
self.file = file
file.seek(0, 2)
self.position = file.tell()
self.chunk_size = chunk_size
self.buffer = ''
def __iter__(self):
return self
def __next__(self):
if not self.position and not self.buffer:
raise StopIteration
chunk = self.buffer
while True:
line_start = chunk.rfind('\n', 0, len(chunk) - 1 - (chunk is self.buffer))
if line_start != -1:
break
chunk_size = min(self.chunk_size, self.position)
self.position -= chunk_size
self.file.seek(self.position)
chunk = self.file.read(chunk_size)
if not chunk:
line = self.buffer
self.buffer = ''
return line
self.buffer = chunk + self.buffer
line_start += 1
line = self.buffer[line_start:]
self.buffer = self.buffer[:line_start]
return line因此:
from io import StringIO
f = StringIO('''2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
''')
for line in ReversedTextReader(f):
print(line, end='')产出:
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosrhttps://stackoverflow.com/questions/54932001
复制相似问题