Python 尝试删除空字节，但删除所有文本_Python_Python 3.x_Null_Data Cleaning

Python 尝试删除空字节，但删除所有文本

python python-3.x

Python 尝试删除空字节，但删除所有文本,python,python-3.x,null,data-cleaning,Python,Python 3.x,Null,Data Cleaning,所以我试图从一些文本中删除空字节。我写了三个函数，我认为它们做同样的事情。他们最终都给了我空白文件并删除了所有输入以下是带有空字节的示例输入： T: 14/01/2015 22:27:05**\00**||||END_OF_RECORD <- ** so you can see it (I can see it in my ubuntu text editor) T: 14/01/2015 22:27:05 ||||END_OF_RECORD <- what my IDE

所以我试图从一些文本中删除空字节。我写了三个函数，我认为它们做同样的事情。他们最终都给了我空白文件并删除了所有输入

以下是带有空字节的示例输入：

T:  14/01/2015 22:27:05**\00**||||END_OF_RECORD <- ** so you can see it (I can see it in my ubuntu text editor)
T:  14/01/2015 22:27:05 ||||END_OF_RECORD <- what my IDE shows is a box there

任何建议都将不胜感激。谢谢

在尝试删除任何空字符之前，需要将文件内容解码为文本。对于许多编解码器，空字节是正常编码的一部分，因此您不应该尝试删除它们：

'abc'

。我明白了。谢谢，我尝试在原始文本文件上运行此命令：（我将它们转换为csv）str.encode（encoding='UTF-8'，errors='strict'），但当我使用内置文本编辑器打开文件时，仍然看到错误：“您打开的文件包含一些无效字符。如果继续编辑此文件，可能会损坏此文档。你也可以选择另一种字符编码，然后再试一次。“我很确定空字节不应该存在，而是从原始富文本文档转换而来的，原始应用程序输入创建了一些垃圾格式（请参见上面的输入）。无论如何，你必须使用正确的编码对文件进行解码（这显然不是utf-8）。原始文件是如何创建的？您使用什么将其“转换”为csv？

from pathlib import Path

# Removes null bytes from the txt files
def removeNULLBytes():
    for p in workspace.glob('*.csv'):
        new = Path(workspace, p.name)
        new = new.with_suffix('.csv')
        with p.open() as infile, new.open('wb') as outfile:
            fileName = infile.name
            with open(fileName, 'rb') as in_file:
                data = in_file.read()
                # data = str(data, encoding='utf8', errors='ignore')
                data = (data.replace(b'\x00', b''))
                outfile.write(data)


def removeNULLs():
    for p in workspace.glob('*.csv'):
        new = Path(workspace, p.name)
        new = new.with_suffix('.csv')
        with p.open() as infile, new.open('w') as outfile:
            fileName = infile.name
            with open(fileName, 'r') as in_file:
                data = in_file.read()
                # data = str(data, encoding='utf8', errors='ignore')
                data = (data.replace(u"\u0000", ""))
                outfile.write(data)

def removeNull():
    for p in workspace.glob('*.csv'):
        new = Path(workspace, p.name)
        new = new.with_suffix('.csv')
        with p.open() as infile, new.open('w') as outfile:
            for line in infile.read():
                newline = ''.join([i if not u"\u0000" else "" for i in line])
                data = (line.replace(line, newline))
                outfile.writelines(data)

if __name__ == '__main__':
    workspace = Path('/home/')
    # removeNULLBytes()
    removeNull()
    # removeNULLs()