Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 尝试删除空字节,但删除所有文本_Python_Python 3.x_Null_Data Cleaning - Fatal编程技术网

Python 尝试删除空字节,但删除所有文本

Python 尝试删除空字节,但删除所有文本,python,python-3.x,null,data-cleaning,Python,Python 3.x,Null,Data Cleaning,所以我试图从一些文本中删除空字节。 我写了三个函数,我认为它们做同样的事情。 他们最终都给了我空白文件并删除了所有输入 以下是带有空字节的示例输入: T: 14/01/2015 22:27:05**\00**||||END_OF_RECORD <- ** so you can see it (I can see it in my ubuntu text editor) T: 14/01/2015 22:27:05 ||||END_OF_RECORD <- what my IDE

所以我试图从一些文本中删除空字节。 我写了三个函数,我认为它们做同样的事情。 他们最终都给了我空白文件并删除了所有输入

以下是带有空字节的示例输入:

T:  14/01/2015 22:27:05**\00**||||END_OF_RECORD <- ** so you can see it (I can see it in my ubuntu text editor)
T:  14/01/2015 22:27:05 ||||END_OF_RECORD <- what my IDE shows is a box there

任何建议都将不胜感激。谢谢

在尝试删除任何空字符之前,需要将文件内容解码为文本。对于许多编解码器,空字节是正常编码的一部分,因此您不应该尝试删除它们:
'abc'
。我明白了。谢谢,我尝试在原始文本文件上运行此命令:(我将它们转换为csv)str.encode(encoding='UTF-8',errors='strict'),但当我使用内置文本编辑器打开文件时,仍然看到错误:“您打开的文件包含一些无效字符。如果继续编辑此文件,可能会损坏此文档。你也可以选择另一种字符编码,然后再试一次。“我很确定空字节不应该存在,而是从原始富文本文档转换而来的,原始应用程序输入创建了一些垃圾格式(请参见上面的输入)。无论如何,你必须使用正确的编码对文件进行解码(这显然不是utf-8)。原始文件是如何创建的?您使用什么将其“转换”为csv?
from pathlib import Path

# Removes null bytes from the txt files
def removeNULLBytes():
    for p in workspace.glob('*.csv'):
        new = Path(workspace, p.name)
        new = new.with_suffix('.csv')
        with p.open() as infile, new.open('wb') as outfile:
            fileName = infile.name
            with open(fileName, 'rb') as in_file:
                data = in_file.read()
                # data = str(data, encoding='utf8', errors='ignore')
                data = (data.replace(b'\x00', b''))
                outfile.write(data)


def removeNULLs():
    for p in workspace.glob('*.csv'):
        new = Path(workspace, p.name)
        new = new.with_suffix('.csv')
        with p.open() as infile, new.open('w') as outfile:
            fileName = infile.name
            with open(fileName, 'r') as in_file:
                data = in_file.read()
                # data = str(data, encoding='utf8', errors='ignore')
                data = (data.replace(u"\u0000", ""))
                outfile.write(data)

def removeNull():
    for p in workspace.glob('*.csv'):
        new = Path(workspace, p.name)
        new = new.with_suffix('.csv')
        with p.open() as infile, new.open('w') as outfile:
            for line in infile.read():
                newline = ''.join([i if not u"\u0000" else "" for i in line])
                data = (line.replace(line, newline))
                outfile.writelines(data)

if __name__ == '__main__':
    workspace = Path('/home/')
    # removeNULLBytes()
    removeNull()
    # removeNULLs()