我有以下.txt:
生殖支原体临床株全MgPa操作子及其重复染色体元件的标题遗传变异
期刊PLoS ONE 5 (12),E15660 (2010年)
PUBMED 21187921
备注出版状况:网上专用
参3(基数1至1480)
PUBMED 21997874
参4(基数1至1480)
参5(基数1至1480)
作者Ma,L.,Jensen,J.S.,贾,Q.,Mancuso,M.A.,Myers,L.J.和Martin,D.H.
标题直接提交
起源
1 agtaagaatg ttactgctta cacccccttc gccaccccca tcaccgattc taaaagtgat 61 ctggttagtt tggcacaact tgattcttct tatcaaatcg ctgaccaaac catccataac 121 accaacttgt ttgtgttgtt caagtccaag gatgtgaagc ttacatatag ttcaagtggc 181 tcaaataacc agattagttt tgattcaact agtcaaggtg aaaaaccatc ctatgtggtc 241 gagtttacta actctaccaa cattggcatc aagtgaagcg tggtgaaaaa gtatcagtta 301 gatctaccaa atgttaccaa tgagatgaac caagtgttgc aagaattgat cctagaacaa 361 ccccttacca agtatacctt aaacagtagt ttggctaaac aaaagggcaa aagccagata 421 gaggtacatc ttggttcaaa ttcaaatcag tgacaatcga tgcgtaatca acatgaccta 481 aacaacaatc ccagccccaa tgcttcaact gggtttaaac tcactaccgg caacgcatat 541 agaaaattaa atgagtcctg accaatttat caaccaattg atgggaccaa gcagggcaaa 601 gggaaggata gtagtgggtg gagttcaaca gaagcaacaa cggcaaaaaa tgatgcgccc 661 agtgtttctg gaagtggaac atcagacacc gcttcaaaat tcaaaagtta cctcaacacc 721 aagcaagcgt tagagagcat cggcatcttg tttgatgggg atggaatgag gaatgtggtt 781 acccagctct attatgcttc tactagcaag ctagcagtca ccaacaacca cattgtcgtg 841 atgggtaaca gctttctacc cagcatgtgg tactgggtgg tggagcggag tgcaacaact 901 gattcatcat caaaacccac ctggtttgct aataccaatt taaactgagg ggaagataaa 961 caaaaacaat ttgttgagaa ccagttgggg tataaggaaa ctaccagtac caattcccac 1021aacttccatt ccaaatcttt 1141 tagtagtacc gtagtagtag 1081 actagcttag gcatatctga gacaccttca 1261 ggtccgatca gacaccttca 1261 gacaccttca caagatcaat gcatatctga 1261 gtagtagtag gatctagtga gacaccttca gacaccttca 1261 ggtccgatca gacaccttca gacaccttca
我想要创建一个新的输出是这样的。
agtaagaatg ttactgctta cacccccttc gccaccccca tcaccgattc taaaagtgat ctggttagtt tggcacaact tgattcttct tatcaaatcg ctgaccaaac catccataac accaacttgt ttgtgttgtt caagtccaag gatgtgaagc ttacatatag ttcaagtggc tcaaataacc agattagttt tgattcaact agtcaaggtg aaaaaccatc ctatgtggtc gagtttacta actctaccaa cattggcatc aagtgaagcg tggtgaaaaa gtatcagtta gatctaccaa atgttaccaa tgagatgaac caagtgttgc aagaattgat cctagaacaa ccccttacca agtatacctt aaacagtagt ttggctaaac aaaagggcaa aagccagata gaggtacatc ttggttcaaa ttcaaatcag tgacaatcga tgcgtaatca acatgaccta aacaacaatc ccagccccaa tgcttcaact gggtttaaac tcactaccgg caacgcatat agaaaattaa atgagtcctg accaatttat caaccaattg atgggaccaa gcagggcaaa gggaaggata gtagtgggtg gagttcaaca gaagcaacaa cggcaaaaaa tgatgcgccc agtgtttctg gaagtggaac atcagacacc gcttcaaaat tcaaaagtta cctcaacacc aagcaagcgt tagagagcat cggcatcttg tttgatgggg atggaatgag gaatgtggtt acccagctct attatgcttc tactagcaag ctagcagtca ccaacaacca cattgtcgtg atgggtaaca gctttctacc cagcatgtgg tactgggtgg tggagcggag tgcaacaact gattcatcat caaaacccac ctggtttgct aataccaatt taaactgagg ggaagataaa caaaaacaat ttgttgagaa ccagttgggg tataaggaaa ctaccagtac caattcccac aacttccatt ccaaatcttt cacccaacct gcatatctga tcagtggcat tgacagtgtc aatgatcaaa tcatcttcag tggctttaaa gcggggagtg tggggtatga tagtagtagt agtagtagta gtagtagtag tagtagtacc aaagaccaag cacttgcttg atcaacaacaactagcttag atagtaaaac gatctagtga ccaacgacac atgggagttt ttcaatccaa gacaccttca tggttcatca aaactgctta aaagatcaaa aaagatcaaa tccttattcg cttgaatagt tatggggatg
我如何告诉控制台,我想要一个新的文件,但只是从原产地到结束?
发布于 2022-02-23 12:08:39
如果您想要的是从某一行创建一个.txt,这可能适用于您:
flag = false
with open("src.txt", "r") as src:
with open("dst.txt", "w") as dst:
for line in src:
if(flag == True):
dst.write(line)
if(line.__contains__('ORIGIN')):
flag = True这将遍历源文件中的行,每当它发现单词“原产地”时,就开始将src文件中的内容写入dst文件中。
发布于 2022-02-23 12:37:26
似乎你也想删除任何数字和斜杠。因此,您可以这样做:
import re
import sys
with open('infile.txt', encoding='utf-8') as infile:
try:
while not next(infile).startswith('ORIGIN'):
pass
with open('outfile.txt', 'w', encoding='utf-8') as outfile:
for line in infile:
outfile.write(re.sub(r'[\d+|/]', '', line).lstrip())
except StopIteration:
print('ORIGIN not found', file=sys.stderr)https://stackoverflow.com/questions/71236517
复制相似问题