在Python中从.txt文件中删除页码_Python_String_File_File Io_Text Files

在Python中从.txt文件中删除页码

python string file file-io

在Python中从.txt文件中删除页码,python,string,file,file-io,text-files,Python,String,File,File Io,Text Files,我正在尝试加载电子书的.txt文件并删除包含页码的行。这本书看起来像： 2 Words More words. More words. 3 More words. 以下是我到目前为止的情况： x = 1 with open("first.txt","r") as input: with open("last.txt","wb") as output: for line in input: if line != str(x) + "\n":

我正在尝试加载电子书的.txt文件并删除包含页码的行。这本书看起来像：

2
Words
More words.

More words.

3
More words.

以下是我到目前为止的情况：

x = 1

with open("first.txt","r") as input:
    with open("last.txt","wb") as output: 
        for line in input:
            if line != str(x) + "\n":
                output.write(line + "\n")
                x + x + 1

我的输出文件中删除了所有的空白（新行）（我不想要），甚至没有删除数字。有人有什么想法吗？谢谢

1）您不必打开二进制文件

open（“last.txt”、“wb”）

open（“last.txt”、“w”）

2）

x+x+1

x+=1

但是，你可以做得简单得多

with open("first.txt","r") as input:
    with open("last.txt","w") as output: 
        for line in input:
            line = line.strip() # clear white space
            try: 
                int(line) #is this a number ?
            except ValueError:
                output.write(line + "\n")

检查是否可以将该行转换为整数，如果成功，则跳过该行。不是最快的解决方案，但应该有效

try:
   int(line)
   # skip storing that line
   continue
except ValueError:
   # save the line to output

使用正则表达式忽略仅包含数字的行

import sys
import re

pattern = re.compile("""^\d+$""")

for line in sys.stdin:
    if not pattern.match(line):
        sys.stdout.write(line)

改进的解决方案-减少一个缩进级别，避免不必要的

条带

和字符串求和，捕获显式异常

with open("first.txt","r") as input_file, open("last.txt","w") as output_file:
    for line in input_file:
        try: 
            int(line)
        except ValueError:
            output_file.write(line)

你希望x+x+1做什么？哎呀，我的意思是：x=x+1。不过，纠正这一点并不能解决任何一个问题（空白或不删除任何数字）。我这样做是因为一旦找到页码（如第1页），我希望它能找到下一页（如第2页）。如果由于某种原因，这本书有一整行，其中只有一个不是页码但实际上是书的一部分的数字，这也会有所帮助。您也可以使用

x+=1

。但是，根据这个例子，如果它不是从第1页开始呢？完全合理，我只是想我应该手动编辑它。太晚了，塔索斯的答案正是这样。你不必

剥离\n
，int（'2\r\n'）
计算为2
。此外，bare except子句不应出现在代码中。您应该使其显式-int（）
方法将引发一个ValueError
。它可以是\s2\s而不是\r\n。int（line）还能抛出什么异常（我们关心的）？我不确定\s
是什么意思int（）
可以处理string.whitespace
中列出的任意数量的前导字符和尾随字符。关于异常-ìnt（）
也可以抛出TypeError
，但决不能在该上下文中抛出（行始终是字符串）。解释器还可以触发键盘中断
，这将由您执行。您永远不想这样做。\s就像在空白中一样。“我不知道如何处理空白，”他说。关于这个异常，在这个特定的例子中，它是非常无害的（我们正在尝试展示一种方法）。尽管如此，我还是会编辑它。这在我的文件中非常有效。非常感谢大家！您应该将该字符串设置为原始字符串，而不是多行：r“^\d+$”
。