我需要写一个正则表达式,在一些病人对药物的评论中用'.'代替','。他们应该在提到副作用后用逗号,但是他们中的一些人使用点。例如:
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it."我编写了一个正则表达式代码来检测一个单词(如头痛)或两个单词(例如,恶梦)周围的两个点:
发现一个被两个点包围的词:
text= re.sub (r'(\.)(\s*\w+\s*\.)',r',\2 ', text )探测到两个被两个点包围的单词:
text = re.sub (r'(\.)(\s*\w+\s\w+\s*\.)',r',\2 ', text11 )这是输出:
the drug side-effects are: night mare, nausea, night sweat. bad dream, dizziness, severe headache. I suffered, she suffered. she told I should change it.但应该是:
the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it.我的代码没有在dot之后替换night sweat to ','。我加上,if a sentence starts with a subject pronoun (such as I and she) I do not want to change dot to comma after it, even if it has two words (such as, I suffered)。我不知道如何将此条件添加到代码中。
有什么建议吗?谢谢!
发布于 2016-10-09 02:01:18
您可以使用以下模式:
\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)这匹配一个点,然后捕获一个或两个单词,其中第一个不是你提到的代词(你将需要扩大名单最有可能)。后面必须是既不是单词字符也不是空格的字符(例如,.、!、:、,)或字符串的结尾。
然后,您必须用,\1替换它。
巨蟒
import re
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it."
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I)
print(text)输出
the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it.这可能不是绝对安全的,您可能需要扩展某些边缘情况下的模式。
https://stackoverflow.com/questions/39938210
复制相似问题