如何在Python中读取输入直到下一次出现
所以问题是,给定下面的输入,我想将每个URL(以[URL或[LINK或[WEBSITE]开头)和文本分开。我想将每个URL按顺序放入列表,将每个文本放入文本 我还想把每个文本合并成一行,这样每个链接都能与其对应的文本相匹配。下面是一个例子如何在Python中读取输入直到下一次出现,python,python-3.x,Python,Python 3.x,所以问题是,给定下面的输入,我想将每个URL(以[URL或[LINK或[WEBSITE]开头)和文本分开。我想将每个URL按顺序放入列表,将每个文本放入文本 我还想把每个文本合并成一行,这样每个链接都能与其对应的文本相匹配。下面是一个例子 [URL - https://url1.com] news_line1 word news_line2 word word news_line3 word word word [LINK - https://url2.com] headline_line1
[URL - https://url1.com]
news_line1 word
news_line2 word word
news_line3 word word word
[LINK - https://url2.com]
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter
[WEBSITE - https://url3.com]
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence
产出将是
链接:
及
正文:
我现在的代码是
import sys
inFile = sys.argv[1]
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
url_links = []
sentences = []
for entry in content:
sentence = ""
if entry.startswith(("[news_text", "[headline", "[date")):
url_links.append(entry)
else:
sentence = sentence + entry
sentences.append(sentence)
for sentence in sentences:
print(sentence)
news_line1 word
news_line2 word word
news_line3 word word word
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence
我得到的电流输出是
import sys
inFile = sys.argv[1]
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
url_links = []
sentences = []
for entry in content:
sentence = ""
if entry.startswith(("[news_text", "[headline", "[date")):
url_links.append(entry)
else:
sentence = sentence + entry
sentences.append(sentence)
for sentence in sentences:
print(sentence)
news_line1 word
news_line2 word word
news_line3 word word word
headline_line1 letter
headline_line2 letter letter
headline_line3 letter letter letter
date_line1 sentence
date_line2 sentence sentence
date_line3 sentence sentence sentence
我如何调整它,使其提供正确的输出?如果不想在输出中添加不必要的空行,则应将其添加到循环中
if not entry:
continue
要获得所需的输出,可以利用字符串:
要将文本拆分为块,让我们添加一个布尔变量,显示是否有块结束(当新url\u链接开始处理时,块结束)
使用列表存储句子元素,记住startswith()区分大小写,修改后代码的相关部分如下:
url_links = []
sentences = []
sentence = []
for entry in s.split('\n'): # s holds your string
entry.strip()
if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
url_links.append(entry)
if sentence: # add only not empty list
sentences.append(' '.join(sentence))
sentence = []
else:
if entry: sentence.append(entry)
else: # this else belongs to for
if sentence: sentences.append(' '.join(sentence))
for sentence in sentences:
print(sentence)
如果我加入的话,所有的东西都会排在一行。我试图把每一篇文章都放在不同的行中。所以每一篇文章都在下面[URL etc将在一行中。我用代码编辑了我的答案,在每个块后添加新行这肯定完成了任务!但它没有输出正确的输出。当多行连接在一起时,它只是将它们放在一起而没有空格。当我这样做时“”。连接会在第一行后的每一行的开头添加空格t行。有什么建议吗?请查看编辑过的答案。例如,我得到的是“新闻”行1字新闻2字新闻3字标题1字字母”。如果是“新闻”行1字新闻2字新闻3字单词单词单词单词输入,它不会输出正确的输出nput文本之间有换行符。因此,s.split(\n)可能会将一件事拆分为多个文本。请在代码失败的地方共享输入示例。如果\n标记行尾,它应该可以工作。如果希望输出在一行上,为什么不使用print(句子,end='')
切掉每个print()中包含的换行符
默认情况下的语句?
previous_block_end = False
for entry in content:
if not entry:
continue
sentence = ""
if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
previous_block_end = True
url_links.append(entry)
else:
sentence = sentence + entry
if previous_block_end and len(url_links) > 1:
sentence = '\n' + sentence
if not previous_block_end:
sentence = ' ' + sentence
previous_block_end = False
sentences.append(sentence)
result = ''.join(sentences)
print(result)
url_links = []
sentences = []
sentence = []
for entry in s.split('\n'): # s holds your string
entry.strip()
if entry.startswith(("[URL", "[LINK", "[WEBSITE")):
url_links.append(entry)
if sentence: # add only not empty list
sentences.append(' '.join(sentence))
sentence = []
else:
if entry: sentence.append(entry)
else: # this else belongs to for
if sentence: sentences.append(' '.join(sentence))
for sentence in sentences:
print(sentence)