我有一个文本文件,其中包含药物和化学结构的编号列表。
有什么办法可以去掉物质名字前面的数字吗?
下面是我到目前为止掌握的代码:
new_file = open("string_cleaned.txt", "w")
for line in open("string.txt", "r"):
x = txt.lsplit(", ", 1)[1]
new_file.write(x)
new_file.close()目标
From:
1 Substance 1
2 Substance 2
To:
Substance 1
Substance 2发布于 2020-12-11 07:40:39
不是防弹解决方案,但如果您的数据像您的示例一样,它可能会起作用。如果需要更多的调整,告诉我。
import string
alphabet = string.ascii_lowercase + string.ascii_uppercase
YourFile = open("yourFile.txt", "r")
listOfLines = YourFile.readlines()
for lineIndex in range(len(listOfLines)):
for char in listOfLines[lineIndex]:
if char in alphabet:
editedLine = listOfLines[lineIndex].split(char,1)[1]
editedLine = str(lineIndex + 1) + " " + char + editedLine #(optional) If you need the Index numbers beside your items
listOfLines[lineIndex] = editedLine
break
anotherFile = open("anotherFile.txt", "w")
anotherFile.writelines(listOfLines)
anotherFile.close因此,在编辑之后,这里是解决方案
YourFile = open("yourFile.txt", "r")
listOfLines = YourFile.readlines()
for index in range(len(listOfLines)):
listOfLines[index] = listOfLines[index].lstrip("0123456789")
listOfLines[index] = listOfLines[index].lstrip(" ")
print(listOfLines[index])
anotherFile = open("anotherFile.txt", "w")
anotherFile.writelines(listOfLines)
anotherFile.close发布于 2020-12-11 06:13:55
编辑:一个特定的解决方案。
import re
result = ""
for line in open("string.txt"):
result += re.sub(r"(?<=\s)[^a-zA-Z]*", "", line)
with open("string_cleaned.txt", "w") as file:
file.write(result)https://stackoverflow.com/questions/65246185
复制相似问题