首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在python中重新排列文件中的值

在python中重新排列文件中的值
EN

Stack Overflow用户
提问于 2022-11-08 09:51:58
回答 2查看 44关注 0票数 0

我有一个大约900 K值的很大的文件。这是对价值观的重复,比如

代码语言:javascript
复制
/begin throw
    COLOR red
     DESCRIPTION
     "cashmere sofa throw"
      10
      10
      156876
     DIMENSION
      140
      200
    STORE_ADDRESS 59110
/end throw

这些值不断变化,但我需要如下所示:

代码语言:javascript
复制
    /begin throw
     STORE_ADDRESS 59110
        COLOR red
         DESCRIPTION "cashmere sofa throw" 10 10 156876
         DIMENSION 140 200
    /end throw

目前,我的方法是删除新行,并在其中包含空间:

存储地址在整个文件中是常量的,所以我考虑从索引中删除它,并在描述之前插入它。

代码语言:javascript
复制
text_file = open(filename, 'r')
filedata = text_file.readlines();

for num,line in enumerate(filedata,0):
    if '/begin' in line:
        for index in range(num, len(filedata)):
            if "store_address 59110 " in filedata[index]:
                    filedata.remove(filedata[index])
                    filedata.insert(filedata[index-7])
                    break
                  
            if "DESCRIPTION" in filedata[index]:
                try:
                    filedata[index] = filedata[index].replace("\n", " ")
                    filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
                    filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
                    filedata[index+3] = filedata[index+3].replace(" ","").replace("\n", " ")
                    filedata[index+4] = filedata[index+4].replace(" ","").replace("\n", " ")
                    filedata[index+5] = filedata[index+5].replace(" ","").replace("\n", " ")
                    filedata[index+6] = filedata[index+6].replace(" ","").replace("\n", " ")
                    filedata[index+7] = filedata[index+7].replace(" ","").replace("\n", " ")
                    filedata[index+8] = filedata[index+8].replace(" ","")
                except IndexError:
                    print("Error Index DESCRIPTION:", index, num)
                
            if "DIMENSION" in filedata[index]:
                try:
                    filedata[index] = filedata[index].replace("\n", " ")
                    filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
                    filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
                    filedata[index+3] = filedata[index+3].replace(" ","")
                except IndexError:
                    print("Error Index DIMENSION:", index, num)

之后,我将filedata写入另一个文件。

这种方法运行时间太长(几乎一个半小时),因为如前所述,它是一个大文件。我想知道是否有更快的方法来解决这个问题

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-11-08 10:07:09

您可以按结构读取文件结构,这样就不必将全部内容存储在内存中并在其中操作它。在结构上,我指的是/begin throw/end throw之间的所有值,并包括它们。这应该要快得多。

代码语言:javascript
复制
def rearrange_structure_and_write_into_file(structure, output_file):
    # TODO: rearrange the elements in structure and write the result into output_file

current_structure = ""
with open(filename, 'r') as original_file:
    with open(output_filename, 'w') as output_file:
        for line in original_file:
            current_structure += line
            if "/end throw" in line:
                rearrange_structure_and_write_into_file(current_structure, output_file)
                current_structure = ""
票数 0
EN

Stack Overflow用户

发布于 2022-11-08 10:44:16

从长列表中插入和删除值可能会使这段代码比它所需的速度慢,还会使它易受任何错误的影响,难以推理。如果有没有store_address的条目,那么代码将无法正常工作,并将搜索其余的条目,直到找到存储地址。

更好的方法是将代码分解为解析每个条目并输出它的函数:

代码语言:javascript
复制
KEYWORDS = ["STORE_ADDRESS", "COLOR", "DESCRIPTION", "DIMENSION"]
    
def parse_lines(lines):
    """ Parse throw data from lines in the old format """
    current_section = None
    r = {}
    for line in lines:
        words = line.strip().split(" ")
        if words[0] in KEYWORDS:
            if words[1:]:
                r[words[0]] = words[1]
            else:
                current_section = r[words[0]] = []
        else:
            current_section.append(line.strip())
    return r

def output_throw(throw):
    """ Output a throw entry as lines of text in the new format """ 
    yield "/begin throw"
    for keyword in KEYWORDS:
        if keyword in throw:
            value = throw[keyword]
            if type(value) is list:
                value = " ".join(value)
            yield f"{keyword} {value}"
    yield "/end throw"
   
with open(filename) as in_file, open("output.txt", "w") as out_file:
    entry = []
    for line in in_file:
        line = line.strip()
        if line == "/begin throw":
            entry = []
        elif line == "/end throw":
            throw = parse_lines(entry)
            for line in output_throw(throw):
                out_file.write(line + "\n")
        else:
            entry.append(line)

或者,如果您确实需要通过删除所有不必要的操作来最大化性能,则可以在单个长条件下进行读写,如下所示:

代码语言:javascript
复制
with open(filename) as in_file, open("output.txt", "w") as out_file:
    entry = []
    in_section = True
    def write(line):
        out_file.write(line + "\n")
    for line in in_file:
        line = line.strip()
        first = line.split()[0]
        if line == "/begin throw":
            in_section = False
            write(line)
            entry = []
        elif line == "/end throw":
            in_section = False
            for line_ in entry:
                write(line_)
            write(line)
        elif first == "STORE_ADDRESS":
            in_section = False
            write(line)
        elif line in KEYWORDS:
            in_section = True
            entry.append(line)
        elif first in KEYWORDS:
            in_section = False
            entry.append(line)
        elif in_section:
            entry[-1] += " " + line
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74358702

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档