文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将每n个行数添加到n列中？

问如何将每n个行数添加到n列中？
EN

Stack Overflow用户

提问于 2022-04-07 14:11:27

回答 3查看 83关注 0票数 2

我有一个683,500行的.txt文件，每7行包含一个不同的人：

ID
名字
工作位置
日期1(年月)
日期2(年月)
总付款
服务时间

I想要阅读.txt和输出(可以是json、csv、txt，甚至数据库中的输出)

ID    Name     Work position   Date 1   Date 2    Gross payment     Service time
ID    Name     Work position   Date 1   Date 2    Gross payment     Service time
ID    Name     Work position   Date 1   Date 2    Gross payment     Service time
ID    Name     Work position   Date 1   Date 2    Gross payment     Service time

txt中的示例：

00000000886

曼努埃尔·德鲁苏斯·苏伯维

媒体总指挥

2006-08

2021-09

30,556.04

15.7

00000000086

曼努埃尔·德鲁苏斯·苏伯维

媒体总指挥

2006-01

2021-09

30,556.04

15.7

00100000086

曼努埃尔·德鲁苏斯·苏伯维

媒体总指挥

2006-01

2021-09

30,556.04

15.7

import csv

#opening file

file = open (r"C:\Users\Redford\Documents\Proyecto automatizacion\data1.txt") #open file
counter = 0
total_lines = len(file.readlines()) #count lines
#print('Total lines:', x)

#reading from file

content = file.read()
colist  = content.split ()
print(colist)


#read data from data1.txt and write in data2.txt

lines = open (r"C:\Users\Redford\Documents\Proyecto automatizacion\data1.txt")
arr = []
with open('data2.txt', 'w') as f:
    for line in lines:
        #arr.append(line)
        f.write (line)

我对编程很陌生，我不知道如何将我的逻辑转换成代码。

jupyter-notebook

json

python-3.x

pandas

csv

回答 3

Stack Overflow用户

发布于 2022-04-07 14:23:24

您的代码不会收集多行代码来将它们写入一个行。

采用这种方法：

逐行读取文件
将每一行不带一个\n收集到一个列表中
如果列表长度达到7，则写入csv并清除列表
重复直到完成

创建数据文件：

with open ("t.txt","w") as f:
    f.write("""00000000886\nMANUEL DE JESUS SUBERVI PEÑA\nMAESTRO MEDIA GENERAL\n2006-08\n2021-09\n30,556.04\n15.7
00000000086\nMANUEL DE JESUS SUBERVI PEÑA\nMAESTRO MEDIA GENERAL\n2006-01\n2021-09\n30,556.04\n15.7
00100000086\nMANUEL DE JESUS SUBERVI PEÑA\nMAESTRO MEDIA GENERAL\n2006-01\n2021-09\n30,556.04\n15.7""")

程序：

import csv

with open("t.csv","w",newline="") as wr, open("t.txt") as r:
    # create a csv writer
    writer = csv.writer(wr)

    # uncomment if you want a header over your data
    # h =  ["ID","Name","Work position","Date 1","Date 2",
    #       "Gross payment","Service time"]
    # writer.writerow(h)

    person = []
    for line in r: # could use enumerate as well, this works ok
        # collect line data minus the \n into list
        person.append(line.strip())

        # this person is finished, write, clear list
        if len(person) == 7:
            # leveraged the csv module writer, look it up if you need
            # to customize it further regarding quoting etc
            writer.writerow(person)
            person = [] # reset list for next person

    # something went wrong, your file is inconsistent, write remainder
    if person:
        writer.writerow(person)

print(open("t.csv").read())

输出：

00000000886,MANUEL DE JESUS SUBERVI PEÑA,MAESTRO MEDIA GENERAL,2006-08,2021-09,"30,556.04",15.7
00000000086,MANUEL DE JESUS SUBERVI PEÑA,MAESTRO MEDIA GENERAL,2006-01,2021-09,"30,556.04",15.7
00100000086,MANUEL DE JESUS SUBERVI PEÑA,MAESTRO MEDIA GENERAL,2006-01,2021-09,"30,556.04",15.7

Readup：csv模块-写入器

需要引用“毛额支付”，因为它包含一个','，它是csv的分隔符--模块自动地这样做。

票数 2

Stack Overflow用户

发布于 2022-04-07 15:21:07

在@PatrickArtner的极好的回答之上，我想提出一个itertools-based解决方案：

import csv
import itertools


def file_grouper_itertools(
        in_filepath="t.txt",
        out_filepath="t.csv",
        size=7):
    with open(in_filepath, 'r') as in_file,\
            open(out_filepath, 'w') as out_file:
        writer = csv.writer(out_file)
        args = [iter(in_file)] * size
        for block in itertools.zip_longest(*args, fillvalue=' '):
            # equivalent, for the given input, to:
            # block = [x.rstrip('\n') for x in block]
            block = ''.join(block).rstrip('\n').split('\n')
            writer.writerow(block)

这里的想法是在所需大小的块中循环。对于较大的组大小，这会变得更快，因为执行主循环的周期较短。

运行一些微基准测试表明，与手动循环(适应为功能)相比，这种方法会使您的用例受益：

import csv


def file_grouper_manual(
        in_filepath="t.txt",
        out_filepath="t.csv",
        size=7):
    with open(in_filepath, 'r') as in_file,\
            open(out_filepath, 'w') as out_file:
        writer = csv.writer(out_file)
        block = []
        for line in in_file:
            block.append(line.rstrip('\n'))
            if len(block) == size:
                writer.writerow(block)
                block = []
        if block:
            writer.writerow(block)

基准：

n = 100_000
k = 7
with open ("t.txt", "w") as f:
    for i in range(n):
        f.write("\n".join(["0123456"] * k))


%timeit file_grouper_manual()
# 1 loop, best of 5: 325 ms per loop
%timeit file_grouper_itertools()
# 1 loop, best of 5: 230 ms per loop

或者，您可以使用Pandas，这非常方便，但需要将所有输入都放入可用内存中(在您的情况下，这不应该是一个问题，而是可以用于更大的输入)：

import numpy as np
import pandas as pd


def file_grouper_pandas(in_filepath="t.txt", out_filepath="t.csv", size=7):
    with open(in_filepath) as in_filepath:
        data = [x.rstrip('\n') for x in in_filepath.readlines()]
    df = pd.DataFrame(np.array(data).reshape((-1, size)), columns=list(range(size)))
    # consistent with the other solutions
    df.to_csv(out_filepath, header=False, index=False)  


%timeit file_grouper_pandas()
# 1 loop, best of 5: 666 ms per loop

票数 2

Stack Overflow用户

发布于 2022-04-07 14:46:19

如果您对表和数据做了大量的工作，那么NumPy和Pandas是非常有用的库。

import numpy as np
import pandas as pd

columns = ['ID', 'Name' , 'Work position', 'Date 1 (year - month)', 'Date 2 (year - month)',
           'Gross payment', 'Service time']

with open('oldfile.txt', 'r') as stream:
    # read file into a list of lines
    lines = stream.readlines()
    # remove newline character from each element of the list.
    lines = [line.strip('\n') for line in lines]
    # Figure out how many rows there will be in the table
    number_of_people = len(lines)/7
    # Split data into rows
    data = np.array_split(lines, number_of_people)

# Convert data to pandas dataframe
df = pd.DataFrame(data, columns = columns)

一旦您将数据转换为Pandas Dataframe，就可以轻松地将其输出到您列出的任何格式。例如，要输出到csv，您可以这样做：

df.to_csv('newfile.csv')

或者对json来说是：

df.to_json('newfile.csv')

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71783782

复制

相似问题

问如何将每n个行数添加到n列中？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将每n个行数添加到n列中？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将每n个行数添加到n列中？
EN