Python-如何格式化长输出字符串?
Python-如何格式化长输出字符串?,python,linux,Python,Linux,我正在使用imdbpie模块查询Imdb并获取电影标题 from imdbpie import Imdb imdb = Imdb() film = str(imdb.search_for_title('some_title')) tit = re.sub(r'[^\w]|year|title|imdb_id|tt[0-9]{7}', ' ', film) print( tit ) 我剥离了不需要的图案并获得了输出: 2015 The Merchant
我正在使用imdbpie模块查询Imdb并获取电影标题
from imdbpie import Imdb
imdb = Imdb()
film = str(imdb.search_for_title('some_title'))
tit = re.sub(r'[^\w]|year|title|imdb_id|tt[0-9]{7}', ' ', film)
print( tit )
我剥离了不需要的图案并获得了输出:
2015 The Merchant Gaekju 2015 2015 Murderers Mobsters Madmen Vol 2 Assassination in the 20th Century 1993 The Manzai 2015 Pre Masters 2015 The 2015 World Series 2015 2015 Foster Farms Bowl 2015 2015 Nephilim Monsters Giants Conference 2015 2015 The Disaster Diaries 2015 L Agenda Des Cataclysmes 2015 The Lobster 2015 Brooklyn Lobster 2005 The Oscar Nominated Short Films 2015 Animation 2015 The Oscar Nominated Short Films 2015 Live Action 2015 La langosta azul 1954 The Fresh Lobster 1948 The Lobster 2013 The Oscar Nominated Short Films 2015 Documentary 2015 A Visit to a Crab and Lobster Factory 1913 BBC Election Debate 2015 The Reaction 2015 Easter Bowl 2011 Beneath the Surface 2011 The Lonesome Lobster 2010
字符串是一行,包含随机变量“year”和“Movie Title”。
我希望将此输出格式化为以下格式:
2015年商人Gaekju 20152015杀人犯暴徒疯子第2卷20世纪的暗杀
2015年世界大赛 2015福斯特农场碗
2015尼菲利姆怪物巨人大会
2015灾难日记
2015 L大灾难议程
... ... 我对代码做了一些修改,在输出字符串中添加了一个新行字符,基本上得到了我所需要的,但也许还有其他更优雅的方法可以做到这一点
tit = re.sub(r'[^\w'+rlist+']|year|title|imdb_id|tt[0-9]{7}', ' ', film)
ntit = re.sub(r'}', '\n', tits)
f = open('titles.txt', 'wt')
print( ntit, file=f )
f.close()
$cat titles.txt2015年国家彩票明星2015年商人盖克朱2015年杀人犯暴徒疯子第二卷20世纪暗杀1993年曼扎伊2015年大师赛预赛2015年世界大赛2015年福斯特农场杯这很乱,但有一种模式。这是一年,接着是21个空格,接着是标题,接着是9个空格,然后又开始了。证明:
>>> import re
>>> map(len, re.findall(r'\s{4,}', s))
[21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9, 21, 9]
然而,依靠这些精确的数字是不明智的。假设有大间隙和中间隙交替出现,然后像这样捕捉它们:
>>> from pprint import *
>>> pprint(re.findall(r'(.+?)\s{15,}(.+?)\s{5,}', s))
[('2015', 'The Merchant Gaekju 2015'),
('2015',
'Murderers Mobsters Madmen Vol 2 Assassination in the 20th Century'),
('1993', 'The Manzai 2015 Pre Masters'),
('2015', 'The 2015 World Series'),
('2015', '2015 Foster Farms Bowl'),
('2015', '2015 Nephilim Monsters Giants Conference'),
('2015', '2015 The Disaster Diaries 2015 L Agenda Des Cataclysmes'),
('2015', 'The Lobster'),
('2015', 'Brooklyn Lobster'),
('2005', 'The Oscar Nominated Short Films 2015 Animation'),
('2015', 'The Oscar Nominated Short Films 2015 Live Action'),
('2015', 'La langosta azul'),
('1954', 'The Fresh Lobster'),
('1948', 'The Lobster'),
('2013', 'The Oscar Nominated Short Films 2015 Documentary'),
('2015', 'A Visit to a Crab and Lobster Factory'),
('1913', 'BBC Election Debate 2015 The Reaction'),
('2015', 'Easter Bowl 2011 Beneath the Surface'),
('2011', 'The Lonesome Lobster')]
pprint
仅用于此处的输出格式设置。这是对正则表达式的解释。为什么您的第一行期望输出也有一年的时间?尝试使用正则表达式?对我来说……这看起来没有一个规律来为它开发方法或函数,我的建议是尝试修复源函数生成的输出,以获得比您发布的更好的输出。我更新了我的问题,提供了更多详细信息