文章/答案/技术大牛

发布

社区首页 >问答首页 >拆分文本而不删除分隔符

问拆分文本而不删除分隔符
EN

Stack Overflow用户

提问于 2018-01-10 10:26:36

回答 1查看 41关注 0票数 2

假设有这样一篇文章

s = '\n\nPART I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level\n\nPART II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles\n\nPART III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\n\nTO GET THE PEOPLE\n\nTO BUILD AND EVOLVE YOUR \nWork Principles: Putting It All Together\n\n'

按分隔符PART拆分

In [14]: parts = re.split(r'\n\nPART',s)
In [15]: parts
Out[15]:
['',
 ' I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level',
 ' II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles',
 ' III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\n\nTO GET THE PEOPLE\n\nTO BUILD AND EVOLVE YOUR \nWork Principles: Putting It All Together\n\n']

将前缀Part添加回列表

In [16]: ['PART '+ i for i in parts if i]
Out[16]:
['PART  I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level',
 'PART  II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles',
 'PART  III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\n\nTO GET THE PEOPLE\n\nTO BUILD AND EVOLVE YOUR \nWork Principles: Putting It All Together\n\n']

我想在一个步骤中完成它，

In [17]: parts = re.findall(r'\n\nPART.+', s)
In [18]: parts
Out[18]:
['\n\nPART I, WHERE I’M COMING FROM',
 '\n\nPART II, LIFE PRINCIPLES',
 '\n\nPART III, WORK PRINCIPLES
#dot stops at \n, I desire to solve the problem with quantifier(multipy many stops)
In [20]: parts = re.findall(r'\n\n(?:PART.+)+', s)
In [21]: parts
Out[21]:
['\n\nPART I, WHERE I’M COMING FROM',
 '\n\nPART II, LIFE PRINCIPLES',
 '\n\nPART III, WORK PRINCIPLES']
#Unfortunately, it prints the same output

如何完成这样的任务？

python

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-01-10 10:32:53

尝试使用regex模块对正向先行进行拆分，以保留分隔符：

import regex
print regex.split(r"(?=\n\nPART)", s, flags=regex.VERSION1)

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48179414

复制

相似问题

问拆分文本而不删除分隔符
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问拆分文本而不删除分隔符EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问拆分文本而不删除分隔符
EN