首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python文本处理

Python文本处理
EN

Stack Overflow用户
提问于 2019-01-03 18:56:39
回答 1查看 58关注 0票数 0

我有一个包含大量数据的数据文件。我只对每个用户的两个代码感兴趣,需要更新。我把新的代码放在单独的文件里。我只想比较这两个文件,并向现有文件添加新代码。

旧文件:(txt2)

代码语言:javascript
复制
.
..
..
alpha Donec vulputate lorem tortor, nec fermentum nibh bibendum vel.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Praesent dictum luctus massa, non euismod lacus.
${alpha_john}: 'Lorem ipsum dolor sit amet, consectetur'
${beta_john}: 'iuhertgh jndsfbjpijwrg'
${alpha_mac}: 'acerat a lorem eget, ultricies'
${beta_mac}: 'elit nibh, eu condimentum orci viverra q'
${alpha_joe}: 'gravida lorem, ut congue diam.'
${beta_joe}: 'orttitor in condimentum nec, venenatis eu urna'
${alpha_mark}: ''
${beta_mark}: ''
${alpha_ross}: 'suscipit vitae felis non suscipit.'
${beta_ross}: 'non vulputate convallis, ligula diam sagittis urna, in venenatis'
${alpha_don}: 'Pellentesque feugiat diam est, at rhoncus orci porttitor'
${beta_don}: 'Sed elementum elit nibh'
${alpha_harry}: 'Proin tempor lacus arcu.'
${beta_harry}: 'posuere sollicitudin mi, et vulputate nisl fringilla non'
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel.
Phasellus dictum justo sit amet ligula varius aliquet auctor et metus.
..
..
.

代码文件:(txt1)

代码语言:javascript
复制
${alpha_john}: 'XXXXXHHHHHHHXXXXXX'
${beta_john}: 'XFFFFFFFFFGGGGGGGGDDDDDD'
${alpha_mac}: 'DDDDDDKKKKKKKKK'
${beta_mac}: 'KKKKKKKKKKKYYYYYYYYYYYYD'
${alpha_joe}: 'TTTTTVVVVVVVVVVVKK'
${beta_joe}: 'OOOOOOOSSSSSSSSSSPPPPPP'
${alpha_ross}: 'SSSSSHHHHHHHHTTTTTTTT'
${beta_ross}: 'PPPPPWWWWWHHHHHHHHHH'
${alpha_harry}: 'IIIIIIEEEEEEETTTTTTTTTT'
${beta_harry}: 'YYYYYYYYEEEEEEEEEEMMMMMMMMMM'

我的代码:

代码语言:javascript
复制
#!/usr/bin/env python

import os, sys, re, time
import argparse
import logging
import time

cat /dev/null > /home/user/scripts/temp/txt3

file1=open("/home/user/scripts/temp/txt1",'r+')
file2=open("/home/user/scripts/temp/txt2", 'r+')
file3=open("/home/user/scripts/temp/txt3", 'r+')

for line1 in file1:
    keyword=line1[line1.find("{")+1:line1.find("}")]
    for line2 in file2:
        if keyword in line2:
            file3.write(line1)
        else:
            file3.write(line2)
file1.close()
file2.close()
file3.close()

输出:

代码语言:javascript
复制
alpha Donec vulputate lorem tortor, nec fermentum nibh bibendum vel.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Praesent dictum luctus massa, non euismod lacus.
${alpha_john}: 'XXXXXHHHHHHHXXXXXX'
${beta_john}: 'iuhertgh jndsfbjpijwrg'
${alpha_mac}: 'acerat a lorem eget, ultricies'
${beta_mac}: 'elit nibh, eu condimentum orci viverra q'
${alpha_joe}: 'gravida lorem, ut congue diam.'
${beta_joe}: 'orttitor in condimentum nec, venenatis eu urna'
${alpha_mark}: ''
${beta_mark}: ''
${alpha_ross}: 'suscipit vitae felis non suscipit.'
${beta_ross}: 'non vulputate convallis, ligula diam sagittis urna, in venenatis'
${alpha_don}: 'Pellentesque feugiat diam est, at rhoncus orci porttitor'
${beta_don}: 'Sed elementum elit nibh'
${alpha_harry}: 'Proin tempor lacus arcu.'
${beta_harry}: 'posuere sollicitudin mi, et vulputate nisl fringilla non'
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos.
Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel.
Phasellus dictum justo sit amet ligula varius aliquet auctor et metus.

这段代码只是在新文件中打印txt1 '${alpha_john}:'XXXXXHHHHHHHXXXXXX'‘中的一行,但其余的行将保持旧文件(txt2)中的那些行。

如何覆盖(txt1)中的所有行?

如果需要任何额外的信息,请让我知道。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-01-03 19:08:30

您正在遍历file2 len(file1)时间,这肯定不是您想要做的。您想要从file1构造一个替换字典,如下所示:

代码语言:javascript
复制
import re

# regex to find usernames.
# You can use str.split to find the usernames like you did if you're
# not comfortable with regular expressions.
user_regex = re.compile(r'^\${([a-zA-Z0-9_]+)}: ')

# rename files to something better
codes_file = "/home/user/scripts/temp/txt1"
old_file = "/home/user/scripts/temp/txt2"
new_file = "/home/user/scripts/temp/txt3"

codes = {}
with open(codes) as f:  # use with to safely open files
    for line in f:
         match = user_regex.search(line)
         if match:
              codes[match.group(1)] = line

# now we have the codes in ram for easy lookup

with open(old_file) as old, open(new_file, 'w') as new:
     for line in old:
         match = user_regex.search(line)
         if match and match.group(1) in codes.keys():
             new.write(codes[match.group(1)])
         else:
             new.write(line)
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54020946

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档