Python 从字符串输出中剥离制表符、换行符和空格,但保留一个空格,以便不连接单词

Python 从字符串输出中剥离制表符、换行符和空格,但保留一个空格,以便不连接单词,python,regex,for-loop,strip,spaces,Python,Regex,For Loop,Strip,Spaces,我有一个列表3,有一个元素,一个字符串: [['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tThomas Lon Van\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t

我有一个列表3,有一个元素,一个字符串:

[['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tThomas Lon Van\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n', '\n\n\n Year Founded\n\n\n\n\n\t\t\t\t\t\t\t\t\t2016\t\t\t\t\t\t\t\n\n', '\n\n\n # of Employees\n\n\n\n\n\t\t\t\t\t\t\t\t\t1-10\t\t\t\t\t\t\t\n\n', '\n\n\n Seeking Funding?\n\n\n\n\n\t\t\t\t\t\t\t\t\tNo \t\t\t\t\t\t\t\n\n', '\n\n\n Funding Phase\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n'], ['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tMacKenzie T Stout,\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n', '\n\n\n Year Founded\n\n\n\n\n\t\t\t\t\t\t\t\t\t2020\t\t\t\t\t\t\t\n\n', '\n\n\n # of Employees\n\n\n\n\n\t\t\t\t\t\t\t\t\t1-10\t\t\t\t\t\t\t\n\n', '\n\n\n Seeking Funding?\n\n\n\n\n\t\t\t\t\t\t\t\t\tYes\t\t\t\t\t\t\t\n\n', '\n\n\n Funding Phase\n\n\n\n\n\t\t\t\t\t\t\t\t\tPre-Seed\t\t\t\t\t\t\t\n\n']]
我想使用正则表达式从输出中剥离\n\t\r,并以易于阅读的格式返回文本

这就是我尝试过的:

list_33 = []
for i in list_3:
     string = ''.join(list_3)
     list_33.append(re.sub('\s+','', string))
print(list_33)
输出:

['HeadquartersorRegionalOfficeMainHeadquarters', 'FoundersThomasLonVan', 'FounderDiversityN/A', 'YearFounded2016', '#ofEmployees1-10', 'SeekingFunding?No', 'FundingPhaseN/A']
这几乎是我所需要的,但我希望在列表3的第一个文本块之后,每个单词和冒号之间有一个空格,即:

['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2015', '# of Employees 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']
关于如何将两个正则表达式函数合并到一个正则表达式中,有什么想法吗

谢谢


另外,我知道对于只有一个元素的列表,我不需要使用for循环,但是将来列表中会有更多的元素,我现在正试图用一个输入来概括代码结构。

您可以浏览列表中的每个字符串,并使用
re.sub
替换每个出现的超过2个空格:

>>> import re
>>> lst = ['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tThomas Lon Van\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n', '\n\n\n Year Founded\n\n\n\n\n\t\t\t\t\t\t\t\t\t2016\t\t\t\t\t\t\t\n\n', '\n\n\n # of Employees\n\n\n\n\n\t\t\t\t\t\t\t\t\t1-10\t\t\t\t\t\t\t\n\n', '\n\n\n Seeking Funding?\n\n\n\n\n\t\t\t\t\t\t\t\t\tNo \t\t\t\t\t\t\t\n\n', '\n\n\n Funding Phase\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n']
>>> [re.sub(r'\s\s+', ': ', word).strip(': ') for word in lst]
['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2016', '# of Employees: 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']

您可以浏览列表中的每个字符串,并使用
re.sub

>>> import re
>>> lst = ['\n\n\n Headquarters or Regional Office\n\n\n\n\n\t\t\t\t\t\t\t\t\tMain Headquarters\t\t\t\t\t\t\t\n\n', '\n\n\n Founders\n\n\n\n\n\t\t\t\t\t\t\t\t\tThomas Lon Van\t\t\t\t\t\t\t\n\n', '\n\n\n Founder Diversity\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n', '\n\n\n Year Founded\n\n\n\n\n\t\t\t\t\t\t\t\t\t2016\t\t\t\t\t\t\t\n\n', '\n\n\n # of Employees\n\n\n\n\n\t\t\t\t\t\t\t\t\t1-10\t\t\t\t\t\t\t\n\n', '\n\n\n Seeking Funding?\n\n\n\n\n\t\t\t\t\t\t\t\t\tNo \t\t\t\t\t\t\t\n\n', '\n\n\n Funding Phase\n\n\n\n\n\t\t\t\t\t\t\t\t\tN/A\t\t\t\t\t\t\t\n\n']
>>> [re.sub(r'\s\s+', ': ', word).strip(': ') for word in lst]
['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2016', '# of Employees: 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']

Anad,感谢您的回复,我希望创建一个for循环,它将用于包含多个条目的列表。我已经更新了问题中的列表,它现在有两个条目,每个条目本身就是一个列表,因此原始的列表理解方法应该可以工作,但我看不出它如何转换为for循环,有什么想法吗?@AndrewLittle1
for I,lst in enumerate(list_3):list_3[I]=[re.sub(…)for word in lst]
对于问题的正则表达式部分,我会将Prem的答案标记为正确,但非常感谢@Barmar帮助我设置答案的格式。感谢您的回复,我希望创建一个for循环,用于包含多个条目的列表。我已经更新了问题中的列表,它现在有两个条目,每个条目本身就是一个列表,因此原始的列表理解方法应该可以工作,但我看不出它如何转换为for循环,有什么想法吗?@AndrewLittle1
for I,lst in enumerate(list_3):list_3[I]=[re.sub(…)for word in lst]
对于问题的正则表达式部分,我会将Prem的答案标记为正确,但非常感谢@Barmar帮助我设置答案格式