首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >执行regex搜索并将结果保存到CSV

执行regex搜索并将结果保存到CSV
EN

Code Review用户
提问于 2016-10-28 10:30:16
回答 1查看 7.5K关注 0票数 2

我是Python初学者,我编写了一个简单的Python程序,它执行以下操作:

  • 在文件行(包含消息)中搜索模式
  • 从行中选择信息并将其保存到磁盘
  • 删除与regex匹配的消息(正则表达式)
  • 将剩余的消息保存到另一个文件中,

regex不可用,所以我选择一条消息-> create,->删除匹配的消息,并对剩余的消息重复相同的消息。

代码语言:javascript
复制
# coding: utf-8

# In[50]:

import re
import csv


# ### Run this part only once in the starting. From here 

# In[2]:

# ### Change the directory to working folder and give the right filename (hdfcbk), 
# ### if unsure what to do go to your folder and right click and copy the filen here, it will look like /home/XYZ/.../Your_folder_name/hdfcbk
smsFile = open('hdfcbk', 'r')
data = smsFile.read()
data = data.split('\n')
main_data = data
regex_list = []
regl = []


# In[3]:
def regex_search(pattern, file_name):
   remove_arr = []
   res = []
   remain_sms = []
   for sms in file_name:
       j= re.match(pattern,sms)
       if j != None:
           res.append(j.groupdict())
           remove_arr.append(sms)
       else:
           remain_sms.append(sms)
   return res, remove_arr, remain_sms


# In[4]:

def write_to_csv(result,csv_name):
    keys = result[0][0].keys()
    with open(csv_name, 'wb') as output_file:
        dict_writer = csv.DictWriter(output_file, keys, dialect='excel')
        dict_writer.writeheader()
        dict_writer.writerows(result[0])


# In[12]: 

# ### To here, now the repetitive run start

# ### Update this pattern file

# In[1]:

pat1 = 'INR (?P<Amount>(.*)) deposited to A\/c No (?P<AccountNo>(.*)) towards (?P<Towards>(.*)) Val (?P<Date>(.*)). Clr Bal is INR (?P<Balance>(.*)) subject to clearing.'


# In[8]:

A = regex_search(pat1,main_data)


# ### Updating main_data to remaining messages

# In[11]:

main_data = A[2]


# ### Writing remaining sms to a file, you don't need to change the file name as it will be updated everything as you run the script. Just look at the remaining sms and make new regex

# In[21]:

with open('remaining_sms.txt', 'w') as fp:
    fp.write('\n'.join('%s' % x for x in main_data))


# ### Update the csv file

# In[ ]:

write_to_csv(A, 'hdfc_test_3.csv')


# ### Keeping all the regexes in one list, update the index number in [i,pat1]

# In[52]:

regl.append([1,pat1])


# ### Wrting the regex index to csv, run this part in the end, or if you're unsure that you will make the mistake run this part and keep changing the output file name

# In[53]:

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(regl)

我评论了代码中的所有内容。现在的问题是,我需要把这个任务发送给一些人,他们对编码一无所知。所以我说了这么多。

请您检查一下我的代码,并建议我还能做些什么来改进代码,这样人们就可以不受任何麻烦地运行代码了吗?

EN

回答 1

Code Review用户

回答已采纳

发布于 2016-10-28 12:06:06

10提示、问题等

  1. 不要用# ###开始注释
  2. 将函数外部的代码放在main()函数中。这使得如果模块被导入,这段代码不会在不经意间运行。您可以使用这个标准框架: def ():# put代码,如果__name__ == "__main__":main()在执行文件时要运行这些代码
  3. 正确拼写注释中的单词,如果它们太长,则将它们包装到新行中。(经过几年的编码,我发现绝大多数评论都不需要超过一行。)
  4. 每当使用文件时使用with open("file.txt", 'r') as f:
  5. 始终用新行结束文件
  6. None相比,使用isnot代替==!=
  7. 实际上,使用一个名称良好的变量显式地捕获regex_search方法的三个返回值。
  8. 根据pep-8指南,变量名称中没有大写字母。
  9. 当写入with open()语句内的文件时,我发现使用打印比写入更容易: print(some_text、file=some_file、flush=True)
  10. 使用string.format()代替过时的字符串格式工具(%S.)
代码语言:javascript
复制
# coding: utf-8

import re
import csv


def regex_search(pattern, file_name):
    remove_arr = []
    res = []
    remain_sms = []
    for sms in file_name:
        j = re.match(pattern, sms)
        if j is not None:
            res.append(j.groupdict())
            remove_arr.append(sms)
        else:
            remain_sms.append(sms)
    return res, remove_arr, remain_sms


def write_to_csv(result, csv_name):
    keys = result[0][0].keys()
    with open(csv_name, 'wb') as output_file:
        dict_writer = csv.DictWriter(output_file, keys, dialect='excel')
        dict_writer.writeheader()
        dict_writer.writerows(result[0])


def main():
    # Run this part only once in the starting. From here

    # change the directory to working folder and give the right filename (hdfcbk),
    # if unsure what to do go to your folder and right click and copy the file here,
    # it will look like /home/XYZ/.../Your_folder_name/hdfcbk
    with open('hdfcbk', 'r') as smsFile:
        data = smsFile.read()
    data = data.split('\n')
    main_data = data
    regl = []

    pat1 = 'INR (?P<Amount>(.*)) deposited to A\/c No (?P<AccountNo>(.*)) towards (?P<Towards>(.*)) Val (?P<Date>(.*)). Clr Bal is INR (?P<Balance>(.*)) subject to clearing.'

    # TODO - Use much more descriptive names...no idea what's going on here without searching for a while
    a, b, c = regex_search(pat1, main_data)

    # Updating main_data to remaining messages
    main_data = c

    # Writing remaining sms to a file, you don't need to change the file name as it will be updated
    # everything as you run the script. Just look at the remaining sms and make new regex.
    with open('remaining_sms.txt', 'w') as fp:
        fp.write('\n'.join('{}'.format(x) for x in main_data))

    # Update the csv file
    write_to_csv([a, b, c], 'hdfc_test_3.csv')

    # Keeping all the regexes in one list, update the index number in [i, pat1]
    regl.append([1, pat1])

    # Writing the regex index to csv, run this part in the end, or if you're unsure that you will
    # make the mistake run this part and keep changing the output file name.
    with open("output.csv", "wb") as f:
        writer = csv.writer(f)
        writer.writerows(regl)


if __name__ == "__main__":
    main()
票数 1
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/145511

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档