首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >去掉字符串输出中的制表符、换行符和空格,但保留一个空格,这样单词就不会连接在一起

去掉字符串输出中的制表符、换行符和空格,但保留一个空格,这样单词就不会连接在一起
EN

Stack Overflow用户
提问于 2020-07-25 04:03:26
回答 1查看 43关注 0票数 0

我有一个list_3,只有一个元素,一个字符串:

代码语言:javascript
复制
[['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tThomas Lon Van\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n', '\n\n\n Year Founded\n\n\n\n\n\t\t\t\t\t\t\t\t\t2016\t\t\t\t\t\t\t\n\n', '\n\n\n # of Employees\n\n\n\n\n\t\t\t\t\t\t\t\t\t1-10\t\t\t\t\t\t\t\n\n', '\n\n\n Seeking Funding?\n\n\n\n\n\t\t\t\t\t\t\t\t\tNo \t\t\t\t\t\t\t\n\n', '\n\n\n Funding Phase\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n'], ['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tMacKenzie T Stout,\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n', '\n\n\n Year Founded\n\n\n\n\n\t\t\t\t\t\t\t\t\t2020\t\t\t\t\t\t\t\n\n', '\n\n\n # of Employees\n\n\n\n\n\t\t\t\t\t\t\t\t\t1-10\t\t\t\t\t\t\t\n\n', '\n\n\n Seeking Funding?\n\n\n\n\n\t\t\t\t\t\t\t\t\tYes\t\t\t\t\t\t\t\n\n', '\n\n\n Funding Phase\n\n\n\n\n\t\t\t\t\t\t\t\t\tPre-Seed\t\t\t\t\t\t\t\n\n']]

我想使用正则表达式从输出中剥离\n\t\r,并以易于阅读的格式返回文本

这是我尝试过的:

代码语言:javascript
复制
list_33 = []
for i in list_3:
     string = ''.join(list_3)
     list_33.append(re.sub('\s+','', string))
print(list_33)

输出:

代码语言:javascript
复制
['HeadquartersorRegionalOfficeMainHeadquarters', 'FoundersThomasLonVan', 'FounderDiversityN/A', 'YearFounded2016', '#ofEmployees1-10', 'SeekingFunding?No', 'FundingPhaseN/A']

这几乎就是我需要的,但我希望在list_3的第一个文本块之后,每个单词和冒号之间有一个空格,即:

代码语言:javascript
复制
['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2015', '# of Employees 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']

关于如何将两个regex函数合并为一个函数,您有什么想法吗?

谢谢

ps。我知道我不需要为只有一个元素的列表使用for循环,但在未来列表将有更多的元素,我现在正在尝试使用一个输入来泛化代码结构。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-25 04:13:35

您可以浏览列表中的每个字符串,并使用:替换每个出现2个以上空格的字符串

代码语言:javascript
复制
>>> import re
>>> lst = ['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tThomas Lon Van\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n', '\n\n\n Year Founded\n\n\n\n\n\t\t\t\t\t\t\t\t\t2016\t\t\t\t\t\t\t\n\n', '\n\n\n # of Employees\n\n\n\n\n\t\t\t\t\t\t\t\t\t1-10\t\t\t\t\t\t\t\n\n', '\n\n\n Seeking Funding?\n\n\n\n\n\t\t\t\t\t\t\t\t\tNo \t\t\t\t\t\t\t\n\n', '\n\n\n Funding Phase\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n']
>>> [re.sub(r'\s\s+', ': ', word).strip(': ') for word in lst]
['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2016', '# of Employees: 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63080702

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档