Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/algorithm/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python在txt文件中合并行_Python - Fatal编程技术网

Python在txt文件中合并行

Python在txt文件中合并行,python,Python,关于在txt文件中合并行的问题 文件内容如下(电影字幕)。我想把每个段落中的字幕、英语单词和句子组合成一行,而不是现在分别显示1行、2行或3行 你能告诉我哪种方法在Python中是可行的吗?非常感谢 1 00:00:23,343 --> 00:00:25,678 Been a while since I was up here in front of you. 2 00:00:25,762 --> 00:00:28,847 Maybe I'll do us all a favour

关于在txt文件中合并行的问题

文件内容如下(电影字幕)。我想把每个段落中的字幕、英语单词和句子组合成一行,而不是现在分别显示1行、2行或3行

你能告诉我哪种方法在Python中是可行的吗?非常感谢

1
00:00:23,343 --> 00:00:25,678
Been a while since I was up here
in front of you.

2
00:00:25,762 --> 00:00:28,847
Maybe I'll do us all a favour
and just stick to the cards.

3
00:00:31,935 --> 00:00:34,603
There's been speculation that I was
involved in the events that occurred
on the freeway and the rooftop...

4
00:00:36,189 --> 00:00:39,233
Sorry, Mr Stark, do you
honestly expect us to believe that

5
00:00:39,317 --> 00:00:42,903
that was a bodyguard
in a suit that conveniently appeared,

6
00:00:42,987 --> 00:00:45,698
despite the fact
that you sorely despise bodyguards?

7
00:00:45,782 --> 00:00:46,907
Yes.

8
00:00:46,991 --> 00:00:51,662
And this mysterious bodyguard
was somehow equipped

这种模式似乎是:

  • 一行只有一个数字
  • 下一行包含计时信息,以及
  • 一行或多行文本,用空行分隔

  • 我将编写一个循环,读取第1)行和第2)行,然后编写一个嵌套循环,读取第3)行,直到找到一个空行。这个嵌套循环可以将这些行连接成一行。

    仍在第1行上工作。.其余就是您所期望的

    with open('/home/cam/Documents/1.txt','rb') as f:
        f_out=open('mytxt','w+')
    
    
        lines=f.readlines()
        new_lines=[line.strip() if line == '\n' else line for line in lines]
        #print new_lines
    
    
    
        space_index=[i for i, x in enumerate(new_lines) if x == ""]
        new_list=[0]+space_index
    
        for i in range(len(new_list)):
            try:
                mylist=new_lines[new_list[i]:new_list[i+1]]
            except IndexError, e:
                mylist=new_lines[new_list[i]:]
    
    
            mylist=mylist[1:]
    
            mylist1=[i.strip() for i in mylist]
    
    
            mylist1[2] = " ".join(mylist1[2:])
            final=mylist1[:3]
    
            finallines=[i+"\n" for i in final]
            print finallines
    
            for i in finallines:
                f_out.write(i)
    
    直观解 基于您可以拥有的4种类型的线的简单解决方案:

    • 空行
    • 表示位置的数字(无字母)
    • 副标题的计时(具有特定模式;无字母)
    • 正文
    您可以在每一行上循环,对它们进行分类,然后采取相应的行动

    事实上,非文本非空行(时间线和数字)的“操作”是相同的。因此:

    import re
    
    with open('yourfile.txt') as f:
        exampleText = f.read()
    
    new = ''
    
    for line in exampleText.split('\n'):
        if line == '':
            new += '\n\n'
        elif re.search('[a-zA-Z]', line):  # check if there is text
            new += line + ' ' 
        else:
            new += line + '\n' 
    
    结果:

    >>> print(new)
    1
    00:00:23,343 --> 00:00:25,678
    Been a while since I was up here in front of you. 
    
    2
    00:00:25,762 --> 00:00:28,847
    Maybe I'll do us all a favour and just stick to the cards. 
    ...
    
    Regex解释说:

    • []
      表示其中的任何字符
    • a-z
      表示字符a-z的范围
    • A-Z
      表示字符A-Z的范围
    • 装载要求:

      import re
      
      with open('yourfile.txt') as f:
          exampleText = f.read()
      
      简明一行
      re.sub('\n([0-9]+)\n','\n\n\g\n',re.sub('([^0-9])\n','\g',exampleText))
      
      第一个替换将以换行符结尾的所有文本替换为以空格结尾的文本:

      tmp = re.sub('([^0-9])\n', '\g<1> ', exampleText)
      
      tmp=re.sub(“([^0-9])\n”,“\g”,示例文本)
      
      先前的替换意味着我们在文本最后一部分的末尾失去了换行符。然后,第二个替换项在这些数字行前面添加一个换行符:

      re.sub('\n([0-9]+)\n', '\n\n\g<1>\n', tmp)
      
      re.sub('\n([0-9]+)\n','\n\n\g\n',tmp)
      
      回答得很好,不会破坏aha时刻:)@Brent Washburne,谢谢。你介意把代码也附上以便我们学习吗?试着自己编写代码,然后带着任何问题回来。正如其他人所说,“这不是代码编写服务”@Brent Washburne,谢谢。我试过了,但是新手我甚至不能识别每种类型的内容。是的..检查当前目录中的mytxt,但它跳过了所有的数字(行)…我需要所有的行,但英语的行要合并…对于前两行,请告诉我您的预期输出Hanks,Ajay。我希望每个段落的前两行保持不变。这太棒了!非常感谢您提供的详细信息。循序渐进的指导使每个人的学习变得容易。你的激励和教学精神引领世界走向更美好的地方。希望有一天我能像你一样成长,帮助别人。
      re.sub('\n([0-9]+)\n', '\n\n\g<1>\n', tmp)