我是Python初学者,我编写了一个简单的Python程序,它执行以下操作:
regex不可用,所以我选择一条消息-> create,->删除匹配的消息,并对剩余的消息重复相同的消息。
# coding: utf-8
# In[50]:
import re
import csv
# ### Run this part only once in the starting. From here
# In[2]:
# ### Change the directory to working folder and give the right filename (hdfcbk),
# ### if unsure what to do go to your folder and right click and copy the filen here, it will look like /home/XYZ/.../Your_folder_name/hdfcbk
smsFile = open('hdfcbk', 'r')
data = smsFile.read()
data = data.split('\n')
main_data = data
regex_list = []
regl = []
# In[3]:
def regex_search(pattern, file_name):
remove_arr = []
res = []
remain_sms = []
for sms in file_name:
j= re.match(pattern,sms)
if j != None:
res.append(j.groupdict())
remove_arr.append(sms)
else:
remain_sms.append(sms)
return res, remove_arr, remain_sms
# In[4]:
def write_to_csv(result,csv_name):
keys = result[0][0].keys()
with open(csv_name, 'wb') as output_file:
dict_writer = csv.DictWriter(output_file, keys, dialect='excel')
dict_writer.writeheader()
dict_writer.writerows(result[0])
# In[12]:
# ### To here, now the repetitive run start
# ### Update this pattern file
# In[1]:
pat1 = 'INR (?P<Amount>(.*)) deposited to A\/c No (?P<AccountNo>(.*)) towards (?P<Towards>(.*)) Val (?P<Date>(.*)). Clr Bal is INR (?P<Balance>(.*)) subject to clearing.'
# In[8]:
A = regex_search(pat1,main_data)
# ### Updating main_data to remaining messages
# In[11]:
main_data = A[2]
# ### Writing remaining sms to a file, you don't need to change the file name as it will be updated everything as you run the script. Just look at the remaining sms and make new regex
# In[21]:
with open('remaining_sms.txt', 'w') as fp:
fp.write('\n'.join('%s' % x for x in main_data))
# ### Update the csv file
# In[ ]:
write_to_csv(A, 'hdfc_test_3.csv')
# ### Keeping all the regexes in one list, update the index number in [i,pat1]
# In[52]:
regl.append([1,pat1])
# ### Wrting the regex index to csv, run this part in the end, or if you're unsure that you will make the mistake run this part and keep changing the output file name
# In[53]:
with open("output.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(regl)我评论了代码中的所有内容。现在的问题是,我需要把这个任务发送给一些人,他们对编码一无所知。所以我说了这么多。
请您检查一下我的代码,并建议我还能做些什么来改进代码,这样人们就可以不受任何麻烦地运行代码了吗?
发布于 2016-10-28 12:06:06
# ###开始注释with open("file.txt", 'r') as f:None相比,使用is和not代替==和!=regex_search方法的三个返回值。with open()语句内的文件时,我发现使用打印比写入更容易: print(some_text、file=some_file、flush=True)# coding: utf-8
import re
import csv
def regex_search(pattern, file_name):
remove_arr = []
res = []
remain_sms = []
for sms in file_name:
j = re.match(pattern, sms)
if j is not None:
res.append(j.groupdict())
remove_arr.append(sms)
else:
remain_sms.append(sms)
return res, remove_arr, remain_sms
def write_to_csv(result, csv_name):
keys = result[0][0].keys()
with open(csv_name, 'wb') as output_file:
dict_writer = csv.DictWriter(output_file, keys, dialect='excel')
dict_writer.writeheader()
dict_writer.writerows(result[0])
def main():
# Run this part only once in the starting. From here
# change the directory to working folder and give the right filename (hdfcbk),
# if unsure what to do go to your folder and right click and copy the file here,
# it will look like /home/XYZ/.../Your_folder_name/hdfcbk
with open('hdfcbk', 'r') as smsFile:
data = smsFile.read()
data = data.split('\n')
main_data = data
regl = []
pat1 = 'INR (?P<Amount>(.*)) deposited to A\/c No (?P<AccountNo>(.*)) towards (?P<Towards>(.*)) Val (?P<Date>(.*)). Clr Bal is INR (?P<Balance>(.*)) subject to clearing.'
# TODO - Use much more descriptive names...no idea what's going on here without searching for a while
a, b, c = regex_search(pat1, main_data)
# Updating main_data to remaining messages
main_data = c
# Writing remaining sms to a file, you don't need to change the file name as it will be updated
# everything as you run the script. Just look at the remaining sms and make new regex.
with open('remaining_sms.txt', 'w') as fp:
fp.write('\n'.join('{}'.format(x) for x in main_data))
# Update the csv file
write_to_csv([a, b, c], 'hdfc_test_3.csv')
# Keeping all the regexes in one list, update the index number in [i, pat1]
regl.append([1, pat1])
# Writing the regex index to csv, run this part in the end, or if you're unsure that you will
# make the mistake run this part and keep changing the output file name.
with open("output.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(regl)
if __name__ == "__main__":
main()https://codereview.stackexchange.com/questions/145511
复制相似问题