首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法在cStringIO上迭代

无法在cStringIO上迭代
EN

Stack Overflow用户
提问于 2018-02-08 05:19:20
回答 1查看 243关注 0票数 0

在脚本中,我正在向文件中写入行,但其中一些行可能是重复的。所以我创建了一个临时的类似cStringIO文件的对象,我称之为“中间文件”。我首先将行写入中间文件,删除重复项,然后写入真正的文件。

因此,我编写了一个简单的for循环来遍历中间文件中的每一行,并删除任何重复项。

代码语言:javascript
复制
def remove_duplicates(f_temp, dir_out):  # f_temp is the cStringIO object.
    """Function to remove duplicates from the intermediate file and write to physical file."""
    lines_seen = set()  # Define a set to hold lines already seen.
    f_out = define_outputs(dir_out)  # Create the real output file by calling function "define_outputs". Note: This function is not shown in my pasted code.

    cStringIO.OutputType.getvalue(f_temp)  # From: https://stackoverflow.com/a/40553378/8117081

    for line in f_temp:  # Iterate through the cStringIO file-like object.
        line = compute_md5(line)  # Function to compute the MD5 hash of each line. Note: This function is not shown in my pasted code.
        if line not in lines_seen:  # Not a duplicate line (based on MD5 hash, which is supposed to save memory).
            f_out.write(line)
            lines_seen.add(line)
    f_out.close()

我的问题是for循环永远不会执行。我可以通过在我的调试器中放置一个断点来验证这一点;跳过那行代码,函数退出。我甚至读取了this answer from this thread并插入了代码cStringIO.OutputType.getvalue(f_temp),但这并没有解决我的问题。

我不明白为什么我不能读取和迭代我的类文件对象。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-02-08 05:29:10

你提到的答案有点不完整。它说明了如何将cStringIO缓冲区作为字符串获取,但随后您必须对该字符串执行某些操作。你可以这样做:

代码语言:javascript
复制
def remove_duplicates(f_temp, dir_out):  # f_temp is the cStringIO object.
    """Function to remove duplicates from the intermediate file and write to physical file."""
    lines_seen = set()  # Define a set to hold lines already seen.
    f_out = define_outputs(dir_out)  # Create the real output file by calling function "define_outputs". Note: This function is not shown in my pasted code.

    # contents = cStringIO.OutputType.getvalue(f_temp)  # From: https://stackoverflow.com/a/40553378/8117081
    contents = f_temp.getvalue()     # simpler approach
    contents = contents.strip('\n')  # remove final newline to avoid adding an extra row
    lines = contents.split('\n')     # convert to iterable

    for line in lines:  # Iterate through the list of lines.
        line = compute_md5(line)  # Function to compute the MD5 hash of each line. Note: This function is not shown in my pasted code.
        if line not in lines_seen:  # Not a duplicate line (based on MD5 hash, which is supposed to save memory).
            f_out.write(line + '\n')
            lines_seen.add(line)
    f_out.close()

但在f_temp“文件句柄”上使用普通IO操作可能更好,如下所示:

代码语言:javascript
复制
def remove_duplicates(f_temp, dir_out):  # f_temp is the cStringIO object.
    """Function to remove duplicates from the intermediate file and write to physical file."""
    lines_seen = set()  # Define a set to hold lines already seen.
    f_out = define_outputs(dir_out)  # Create the real output file by calling function "define_outputs". Note: This function is not shown in my pasted code.

    # move f_temp's pointer back to the start of the file, to allow reading
    f_temp.seek(0)

    for line in f_temp:  # Iterate through the cStringIO file-like object.
        line = compute_md5(line)  # Function to compute the MD5 hash of each line. Note: This function is not shown in my pasted code.
        if line not in lines_seen:  # Not a duplicate line (based on MD5 hash, which is supposed to save memory).
            f_out.write(line)
            lines_seen.add(line)
    f_out.close()

下面是一个测试(使用其中任何一个):

代码语言:javascript
复制
import cStringIO, os

def define_outputs(dir_out):
    return open('/tmp/test.txt', 'w') 

def compute_md5(line):
    return line

f = cStringIO.StringIO()
f.write('string 1\n')
f.write('string 2\n')
f.write('string 1\n')
f.write('string 2\n')
f.write('string 3\n')

remove_duplicates(f, 'tmp')
with open('/tmp/test.txt', 'r') as f:
    print(str([row for row in f]))
# ['string 1\n', 'string 2\n', 'string 3\n']
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/48673418

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档