Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python如何将连字号与换行符合并?_Python_Regex_Python 3.x - Fatal编程技术网

Python如何将连字号与换行符合并?

Python如何将连字号与换行符合并?,python,regex,python-3.x,Python,Regex,Python 3.x,我有上千个文本文件,上面有类似的数据,文字用连字符和换行符包装 我想做的是删除连字符并将换行符放在单词的末尾。如果可能的话,我不想删除所有连字符的单词,只想删除行末尾的单词 I want to say that Napp Granade serves in the spirit of a town in our dis- trict of Georgia called Andersonville. 上面的代码不起作用,我尝试了几种不同的方法 我希望遍历整个文本文件并删除所有表示换行符的连字符。

我有上千个文本文件,上面有类似的数据,文字用连字符和换行符包装

我想做的是删除连字符并将换行符放在单词的末尾。如果可能的话,我不想删除所有连字符的单词,只想删除行末尾的单词

I want to say that Napp Granade
serves in the spirit of a town in our dis-
trict of Georgia called Andersonville.
上面的代码不起作用,我尝试了几种不同的方法

我希望遍历整个文本文件并删除所有表示换行符的连字符。例如:

            with open(filename, encoding="utf8") as f:
              file_str = f.read()


            re.sub("\s*-\s*", "", file_str)

            with open(filename, "w", encoding="utf8") as f:
              f.write(file_str)

任何帮助都将不胜感激。

您不需要使用正则表达式:

I want to say that Napp Granade
serves in the spirit of a town in our district
of Georgia called Andersonville.
但是你当然可以,而且它比较短:

filename = 'test.txt'

# I want to say that Napp Granade
# serves in the spirit of a town in our dis-
# trict of Georgia called Anderson-
# ville.

with open(filename, encoding="utf8") as f:
    lines = [line.strip('\n') for line in f]
    for num, line in enumerate(lines):
        if line.endswith('-'):
            # the end of the word is at the start of next line
            end = lines[num+1].split()[0]
            # we remove the - and append the end of the word
            lines[num] = line[:-1] + end
            # and remove the end of the word and possibly the 
            # following space from the next line
            lines[num+1] = lines[num+1][len(end)+1:]

    text = '\n'.join(lines)

with open(filename, "w", encoding="utf8") as f:
    f.write(text)


# I want to say that Napp Granade
# serves in the spirit of a town in our district
# of Georgia called Andersonville.
我们查找一个
-
,后跟
\n
,并捕获以下单词,即拆分单词的结尾。
我们用捕获的单词和换行符来替换所有这些


不要忘记使用原始字符串进行替换,以便正确解释
\1

连字号是否始终位于行尾,或者换行符之前是否也有空格字符?
with open(filename, encoding="utf8") as f:
    text = f.read()

text = re.sub(r'-\n(\w+ *)', r'\1\n', text)

with open(filename, "w", encoding="utf8") as f:
        f.write(text)