我有一个大约900 K值的很大的文件。这是对价值观的重复,比如
/begin throw
COLOR red
DESCRIPTION
"cashmere sofa throw"
10
10
156876
DIMENSION
140
200
STORE_ADDRESS 59110
/end throw这些值不断变化,但我需要如下所示:
/begin throw
STORE_ADDRESS 59110
COLOR red
DESCRIPTION "cashmere sofa throw" 10 10 156876
DIMENSION 140 200
/end throw目前,我的方法是删除新行,并在其中包含空间:
存储地址在整个文件中是常量的,所以我考虑从索引中删除它,并在描述之前插入它。
text_file = open(filename, 'r')
filedata = text_file.readlines();
for num,line in enumerate(filedata,0):
if '/begin' in line:
for index in range(num, len(filedata)):
if "store_address 59110 " in filedata[index]:
filedata.remove(filedata[index])
filedata.insert(filedata[index-7])
break
if "DESCRIPTION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","").replace("\n", " ")
filedata[index+4] = filedata[index+4].replace(" ","").replace("\n", " ")
filedata[index+5] = filedata[index+5].replace(" ","").replace("\n", " ")
filedata[index+6] = filedata[index+6].replace(" ","").replace("\n", " ")
filedata[index+7] = filedata[index+7].replace(" ","").replace("\n", " ")
filedata[index+8] = filedata[index+8].replace(" ","")
except IndexError:
print("Error Index DESCRIPTION:", index, num)
if "DIMENSION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","")
except IndexError:
print("Error Index DIMENSION:", index, num)之后,我将filedata写入另一个文件。
这种方法运行时间太长(几乎一个半小时),因为如前所述,它是一个大文件。我想知道是否有更快的方法来解决这个问题
发布于 2022-11-08 10:07:09
您可以按结构读取文件结构,这样就不必将全部内容存储在内存中并在其中操作它。在结构上,我指的是/begin throw和/end throw之间的所有值,并包括它们。这应该要快得多。
def rearrange_structure_and_write_into_file(structure, output_file):
# TODO: rearrange the elements in structure and write the result into output_file
current_structure = ""
with open(filename, 'r') as original_file:
with open(output_filename, 'w') as output_file:
for line in original_file:
current_structure += line
if "/end throw" in line:
rearrange_structure_and_write_into_file(current_structure, output_file)
current_structure = ""发布于 2022-11-08 10:44:16
从长列表中插入和删除值可能会使这段代码比它所需的速度慢,还会使它易受任何错误的影响,难以推理。如果有没有store_address的条目,那么代码将无法正常工作,并将搜索其余的条目,直到找到存储地址。
更好的方法是将代码分解为解析每个条目并输出它的函数:
KEYWORDS = ["STORE_ADDRESS", "COLOR", "DESCRIPTION", "DIMENSION"]
def parse_lines(lines):
""" Parse throw data from lines in the old format """
current_section = None
r = {}
for line in lines:
words = line.strip().split(" ")
if words[0] in KEYWORDS:
if words[1:]:
r[words[0]] = words[1]
else:
current_section = r[words[0]] = []
else:
current_section.append(line.strip())
return r
def output_throw(throw):
""" Output a throw entry as lines of text in the new format """
yield "/begin throw"
for keyword in KEYWORDS:
if keyword in throw:
value = throw[keyword]
if type(value) is list:
value = " ".join(value)
yield f"{keyword} {value}"
yield "/end throw"
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
for line in in_file:
line = line.strip()
if line == "/begin throw":
entry = []
elif line == "/end throw":
throw = parse_lines(entry)
for line in output_throw(throw):
out_file.write(line + "\n")
else:
entry.append(line)或者,如果您确实需要通过删除所有不必要的操作来最大化性能,则可以在单个长条件下进行读写,如下所示:
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
in_section = True
def write(line):
out_file.write(line + "\n")
for line in in_file:
line = line.strip()
first = line.split()[0]
if line == "/begin throw":
in_section = False
write(line)
entry = []
elif line == "/end throw":
in_section = False
for line_ in entry:
write(line_)
write(line)
elif first == "STORE_ADDRESS":
in_section = False
write(line)
elif line in KEYWORDS:
in_section = True
entry.append(line)
elif first in KEYWORDS:
in_section = False
entry.append(line)
elif in_section:
entry[-1] += " " + linehttps://stackoverflow.com/questions/74358702
复制相似问题