Python在txt文件中合并行_Python

Python在txt文件中合并行

python

Python在txt文件中合并行,python,Python,关于在txt文件中合并行的问题文件内容如下（电影字幕）。我想把每个段落中的字幕、英语单词和句子组合成一行，而不是现在分别显示1行、2行或3行你能告诉我哪种方法在Python中是可行的吗？非常感谢 1 00:00:23,343 --> 00:00:25,678 Been a while since I was up here in front of you. 2 00:00:25,762 --> 00:00:28,847 Maybe I'll do us all a favour

关于在txt文件中合并行的问题

文件内容如下（电影字幕）。我想把每个段落中的字幕、英语单词和句子组合成一行，而不是现在分别显示1行、2行或3行

你能告诉我哪种方法在Python中是可行的吗？非常感谢

1
00:00:23,343 --> 00:00:25,678
Been a while since I was up here
in front of you.

2
00:00:25,762 --> 00:00:28,847
Maybe I'll do us all a favour
and just stick to the cards.

3
00:00:31,935 --> 00:00:34,603
There's been speculation that I was
involved in the events that occurred
on the freeway and the rooftop...

4
00:00:36,189 --> 00:00:39,233
Sorry, Mr Stark, do you
honestly expect us to believe that

5
00:00:39,317 --> 00:00:42,903
that was a bodyguard
in a suit that conveniently appeared,

6
00:00:42,987 --> 00:00:45,698
despite the fact
that you sorely despise bodyguards?

7
00:00:45,782 --> 00:00:46,907
Yes.

8
00:00:46,991 --> 00:00:51,662
And this mysterious bodyguard
was somehow equipped

这种模式似乎是：

一行只有一个数字

下一行包含计时信息，以及

一行或多行文本，用空行分隔

我将编写一个循环，读取第1）行和第2）行，然后编写一个嵌套循环，读取第3）行，直到找到一个空行。这个嵌套循环可以将这些行连接成一行。

仍在第1行上工作。.其余就是您所期望的

with open('/home/cam/Documents/1.txt','rb') as f:
    f_out=open('mytxt','w+')


    lines=f.readlines()
    new_lines=[line.strip() if line == '\n' else line for line in lines]
    #print new_lines



    space_index=[i for i, x in enumerate(new_lines) if x == ""]
    new_list=[0]+space_index

    for i in range(len(new_list)):
        try:
            mylist=new_lines[new_list[i]:new_list[i+1]]
        except IndexError, e:
            mylist=new_lines[new_list[i]:]


        mylist=mylist[1:]

        mylist1=[i.strip() for i in mylist]


        mylist1[2] = " ".join(mylist1[2:])
        final=mylist1[:3]

        finallines=[i+"\n" for i in final]
        print finallines

        for i in finallines:
            f_out.write(i)

直观解基于您可以拥有的4种类型的线的简单解决方案：

空行
表示位置的数字（无字母）
副标题的计时（具有特定模式；无字母）
正文

您可以在每一行上循环，对它们进行分类，然后采取相应的行动

事实上，非文本非空行（时间线和数字）的“操作”是相同的。因此：

import re

with open('yourfile.txt') as f:
    exampleText = f.read()

new = ''

for line in exampleText.split('\n'):
    if line == '':
        new += '\n\n'
    elif re.search('[a-zA-Z]', line):  # check if there is text
        new += line + ' ' 
    else:
        new += line + '\n'

结果:

>>> print(new)
1
00:00:23,343 --> 00:00:25,678
Been a while since I was up here in front of you. 

2
00:00:25,762 --> 00:00:28,847
Maybe I'll do us all a favour and just stick to the cards. 
...

Regex解释说：

```
[]
```
表示其中的任何字符
```
a-z
```
表示字符a-z的范围
```
A-Z
```
表示字符A-Z的范围

import re

with open('yourfile.txt') as f:
    exampleText = f.read()

re.sub（'\n（[0-9]+）\n'，'\n\n\g\n'，re.sub（'（[^0-9]）\n'，'\g'，exampleText））

tmp = re.sub('([^0-9])\n', '\g<1> ', exampleText)

tmp=re.sub（“（[^0-9]）\n”，“\g”，示例文本）

re.sub('\n([0-9]+)\n', '\n\n\g<1>\n', tmp)

re.sub（'\n（[0-9]+）\n'，'\n\n\g\n'，tmp）

re.sub('\n([0-9]+)\n', '\n\n\g<1>\n', tmp)