首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将每n个行数添加到n列中?

如何将每n个行数添加到n列中?
EN

Stack Overflow用户
提问于 2022-04-07 14:11:27
回答 3查看 83关注 0票数 2

我有一个683,500行的.txt文件,每7行包含一个不同的人:

  1. ID
  2. 名字
  3. 工作位置
  4. 日期1(年月)
  5. 日期2(年月)
  6. 总付款
  7. 服务时间

I想要阅读.txt和输出(可以是json、csv、txt,甚至数据库中的输出)

代码语言:javascript
复制
ID    Name     Work position   Date 1   Date 2    Gross payment     Service time
ID    Name     Work position   Date 1   Date 2    Gross payment     Service time
ID    Name     Work position   Date 1   Date 2    Gross payment     Service time
ID    Name     Work position   Date 1   Date 2    Gross payment     Service time

txt中的示例:

00000000886

曼努埃尔·德鲁苏斯·苏伯维

媒体总指挥

2006-08

2021-09

30,556.04

15.7

00000000086

曼努埃尔·德鲁苏斯·苏伯维

媒体总指挥

2006-01

2021-09

30,556.04

15.7

00100000086

曼努埃尔·德鲁苏斯·苏伯维

媒体总指挥

2006-01

2021-09

30,556.04

15.7

代码语言:javascript
复制
import csv

#opening file

file = open (r"C:\Users\Redford\Documents\Proyecto automatizacion\data1.txt") #open file
counter = 0
total_lines = len(file.readlines()) #count lines
#print('Total lines:', x)

#reading from file

content = file.read()
colist  = content.split ()
print(colist)


#read data from data1.txt and write in data2.txt

lines = open (r"C:\Users\Redford\Documents\Proyecto automatizacion\data1.txt")
arr = []
with open('data2.txt', 'w') as f:
    for line in lines:
        #arr.append(line)
        f.write (line)

我对编程很陌生,我不知道如何将我的逻辑转换成代码。

EN

回答 3

Stack Overflow用户

发布于 2022-04-07 14:23:24

您的代码不会收集多行代码来将它们写入一个行。

采用这种方法:

  • 逐行读取文件
  • 将每一行不带一个\n收集到一个列表中
  • 如果列表长度达到7,则写入csv并清除列表
  • 重复直到完成

创建数据文件:

代码语言:javascript
复制
with open ("t.txt","w") as f:
    f.write("""00000000886\nMANUEL DE JESUS SUBERVI PEÑA\nMAESTRO MEDIA GENERAL\n2006-08\n2021-09\n30,556.04\n15.7
00000000086\nMANUEL DE JESUS SUBERVI PEÑA\nMAESTRO MEDIA GENERAL\n2006-01\n2021-09\n30,556.04\n15.7
00100000086\nMANUEL DE JESUS SUBERVI PEÑA\nMAESTRO MEDIA GENERAL\n2006-01\n2021-09\n30,556.04\n15.7""")

程序:

代码语言:javascript
复制
import csv

with open("t.csv","w",newline="") as wr, open("t.txt") as r:
    # create a csv writer
    writer = csv.writer(wr)

    # uncomment if you want a header over your data
    # h =  ["ID","Name","Work position","Date 1","Date 2",
    #       "Gross payment","Service time"]
    # writer.writerow(h)

    person = []
    for line in r: # could use enumerate as well, this works ok
        # collect line data minus the \n into list
        person.append(line.strip())

        # this person is finished, write, clear list
        if len(person) == 7:
            # leveraged the csv module writer, look it up if you need
            # to customize it further regarding quoting etc
            writer.writerow(person)
            person = [] # reset list for next person

    # something went wrong, your file is inconsistent, write remainder
    if person:
        writer.writerow(person)

print(open("t.csv").read())

输出:

代码语言:javascript
复制
00000000886,MANUEL DE JESUS SUBERVI PEÑA,MAESTRO MEDIA GENERAL,2006-08,2021-09,"30,556.04",15.7
00000000086,MANUEL DE JESUS SUBERVI PEÑA,MAESTRO MEDIA GENERAL,2006-01,2021-09,"30,556.04",15.7
00100000086,MANUEL DE JESUS SUBERVI PEÑA,MAESTRO MEDIA GENERAL,2006-01,2021-09,"30,556.04",15.7

Readup:csv模块-写入器

需要引用“毛额支付”,因为它包含一个',',它是csv的分隔符--模块自动地这样做。

票数 2
EN

Stack Overflow用户

发布于 2022-04-07 15:21:07

在@PatrickArtner的极好的回答之上,我想提出一个itertools-based解决方案:

代码语言:javascript
复制
import csv
import itertools


def file_grouper_itertools(
        in_filepath="t.txt",
        out_filepath="t.csv",
        size=7):
    with open(in_filepath, 'r') as in_file,\
            open(out_filepath, 'w') as out_file:
        writer = csv.writer(out_file)
        args = [iter(in_file)] * size
        for block in itertools.zip_longest(*args, fillvalue=' '):
            # equivalent, for the given input, to:
            # block = [x.rstrip('\n') for x in block]
            block = ''.join(block).rstrip('\n').split('\n')
            writer.writerow(block)

这里的想法是在所需大小的块中循环。对于较大的组大小,这会变得更快,因为执行主循环的周期较短。

运行一些微基准测试表明,与手动循环(适应为功能)相比,这种方法会使您的用例受益:

代码语言:javascript
复制
import csv


def file_grouper_manual(
        in_filepath="t.txt",
        out_filepath="t.csv",
        size=7):
    with open(in_filepath, 'r') as in_file,\
            open(out_filepath, 'w') as out_file:
        writer = csv.writer(out_file)
        block = []
        for line in in_file:
            block.append(line.rstrip('\n'))
            if len(block) == size:
                writer.writerow(block)
                block = []
        if block:
            writer.writerow(block)

基准:

代码语言:javascript
复制
n = 100_000
k = 7
with open ("t.txt", "w") as f:
    for i in range(n):
        f.write("\n".join(["0123456"] * k))


%timeit file_grouper_manual()
# 1 loop, best of 5: 325 ms per loop
%timeit file_grouper_itertools()
# 1 loop, best of 5: 230 ms per loop

或者,您可以使用Pandas,这非常方便,但需要将所有输入都放入可用内存中(在您的情况下,这不应该是一个问题,而是可以用于更大的输入):

代码语言:javascript
复制
import numpy as np
import pandas as pd


def file_grouper_pandas(in_filepath="t.txt", out_filepath="t.csv", size=7):
    with open(in_filepath) as in_filepath:
        data = [x.rstrip('\n') for x in in_filepath.readlines()]
    df = pd.DataFrame(np.array(data).reshape((-1, size)), columns=list(range(size)))
    # consistent with the other solutions
    df.to_csv(out_filepath, header=False, index=False)  


%timeit file_grouper_pandas()
# 1 loop, best of 5: 666 ms per loop
票数 2
EN

Stack Overflow用户

发布于 2022-04-07 14:46:19

如果您对表和数据做了大量的工作,那么NumPy和Pandas是非常有用的库。

代码语言:javascript
复制
import numpy as np
import pandas as pd

columns = ['ID', 'Name' , 'Work position', 'Date 1 (year - month)', 'Date 2 (year - month)',
           'Gross payment', 'Service time']

with open('oldfile.txt', 'r') as stream:
    # read file into a list of lines
    lines = stream.readlines()
    # remove newline character from each element of the list.
    lines = [line.strip('\n') for line in lines]
    # Figure out how many rows there will be in the table
    number_of_people = len(lines)/7
    # Split data into rows
    data = np.array_split(lines, number_of_people)

# Convert data to pandas dataframe
df = pd.DataFrame(data, columns = columns)

一旦您将数据转换为Pandas Dataframe,就可以轻松地将其输出到您列出的任何格式。例如,要输出到csv,您可以这样做:

代码语言:javascript
复制
df.to_csv('newfile.csv')

或者对json来说是:

代码语言:javascript
复制
df.to_json('newfile.csv')
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71783782

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档