为格式化文本文件而编写的代码中的小错误(不正确的间距)(Python 3)

为格式化文本文件而编写的代码中的小错误(不正确的间距)(Python 3),python,python-3.x,Python,Python 3.x,如果这是一个愚蠢的问题,我很抱歉 我有一些文本,我正试图格式化以使其更易于阅读,所以我尝试用Python编写一个简短的程序来为我编写。我最初使用“查找并替换”选项删除了MS Word中多余的段落分隔符。输入文本如下所示: This is a sentence. So is this one. And this. (empty line) This is the next line (empty line) and some lines are like this. $$This is a se

如果这是一个愚蠢的问题,我很抱歉

我有一些文本,我正试图格式化以使其更易于阅读,所以我尝试用Python编写一个简短的程序来为我编写。我最初使用“查找并替换”选项删除了MS Word中多余的段落分隔符。输入文本如下所示:

This is a sentence. So is this one. And this.
(empty line)
This is the next line
(empty line)
and some lines are like this.
$$This is a sentence. So is this one. And this.
$$This is the next line and some lines are like this.
我想消除所有的空行,这样行与行之间就没有间隔,并确保没有句子像上面一点那样挂在中间。所有新行都应该以两个空格开头,用下面的$符号表示。因此,格式化后,它应该如下所示:

This is a sentence. So is this one. And this.
(empty line)
This is the next line
(empty line)
and some lines are like this.
$$This is a sentence. So is this one. And this.
$$This is the next line and some lines are like this.
我写了以下脚本:

import os

directory = "C:/Users/DELL/Desktop/"
filename = "test.txt"
path = os.path.join(directory, filename)
with open(path,"r") as f_in, open(directory+"output.txt","w+") as f_out:
    temp = "  "
    for line in f_in:
        curr_line = line.strip()
        temp += curr_line
        #print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
        if curr_line:
            if temp[-1]==".": #check if sentence is complete
                f_out.write(temp)
                temp = "\n  " #two blank spaces here
它消除了所有的空行,将新行缩进两个空格,并连接挂起的句子,但没有插入必要的空格-因此输出当前看起来像是在单词行和和之间缺少空格

我试图通过将以下代码行更改为以下内容来解决此问题:

temp += " " + curr_line
temp = "\n " #one space instead of two
这是行不通的,我也不知道为什么。这可能是文本本身的问题,但我会检查一下

任何建议都将不胜感激,如果有更好的方式来做我想做的事情,而不是像我写的那样把事情弄得一团糟,那么我也想知道这一点

编辑:我好像已经修好了。我的文本很长,所以一开始我没有注意到有两行被两行空行隔开,所以我试图修复它的尝试没有成功。我将一行移到下面一点,以生成以下循环,它似乎已经修复了它:

for line in f_in:
        curr_line = line.strip()
        #print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
        if curr_line:
            temp += " " + curr_line
            if temp[-1]==".": #check if sentence is complete
                f_out.write(temp)
                temp = "\n "
我还看到下面的一个答案最初包含了一些正则表达式,我想在将来的某个时候我必须了解这一点。
谢谢大家的帮助。

这应该行得通。它的效率和你的一样,但效率要高一点。不使用字符串串接+++=这很慢,而是将不完整的行保存为列表。然后写2个空格,每个不完整的句子都用空格连接,然后换行,这样就简化了只在一行完成时才写

temp = []
with open(path_in, "r") as f_in, open(path_out, "w") as f_out:
    for line in f_in:
        curr_line = line.strip()
        if curr_line:
            temp.append(curr_line)
            if curr_line.endswith('.'):  # write our line
                f_out.write('  ')
                f_out.write(' '.join(temp))
                f_out.write('\n')
                temp.clear()  # reset temp
输出

  This is a sentence. So is this one. And this.
  This is the next line and some lines are like this.

解释会很好,因为OP是新的,并且正在询问如何做一些事情,而不仅仅是给我代码。我自己设法解决了我的问题。我很愚蠢,没有检查我的输入文本,其中两行文本之间有两个空行,一行包含空格,但这也很有帮助。我不知道startswith和endswith方法存在,或者字符串连接很慢。我会研究你的解决方案。谢谢@FHTMitchell.temp+=curr\u行编译吗?您可能需要一个运算符来组合空格和curr_行,例如+。缺少一个加法运算符,是从旧代码复制的。多谢各位@统一过程