Python 尝试删除空字节,但删除所有文本
所以我试图从一些文本中删除空字节。 我写了三个函数,我认为它们做同样的事情。 他们最终都给了我空白文件并删除了所有输入 以下是带有空字节的示例输入:Python 尝试删除空字节,但删除所有文本,python,python-3.x,null,data-cleaning,Python,Python 3.x,Null,Data Cleaning,所以我试图从一些文本中删除空字节。 我写了三个函数,我认为它们做同样的事情。 他们最终都给了我空白文件并删除了所有输入 以下是带有空字节的示例输入: T: 14/01/2015 22:27:05**\00**||||END_OF_RECORD <- ** so you can see it (I can see it in my ubuntu text editor) T: 14/01/2015 22:27:05 ||||END_OF_RECORD <- what my IDE
T: 14/01/2015 22:27:05**\00**||||END_OF_RECORD <- ** so you can see it (I can see it in my ubuntu text editor)
T: 14/01/2015 22:27:05 ||||END_OF_RECORD <- what my IDE shows is a box there
任何建议都将不胜感激。谢谢 在尝试删除任何空字符之前,需要将文件内容解码为文本。对于许多编解码器,空字节是正常编码的一部分,因此您不应该尝试删除它们:
'abc'
。我明白了。谢谢,我尝试在原始文本文件上运行此命令:(我将它们转换为csv)str.encode(encoding='UTF-8',errors='strict'),但当我使用内置文本编辑器打开文件时,仍然看到错误:“您打开的文件包含一些无效字符。如果继续编辑此文件,可能会损坏此文档。你也可以选择另一种字符编码,然后再试一次。“我很确定空字节不应该存在,而是从原始富文本文档转换而来的,原始应用程序输入创建了一些垃圾格式(请参见上面的输入)。无论如何,你必须使用正确的编码对文件进行解码(这显然不是utf-8)。原始文件是如何创建的?您使用什么将其“转换”为csv?
from pathlib import Path
# Removes null bytes from the txt files
def removeNULLBytes():
for p in workspace.glob('*.csv'):
new = Path(workspace, p.name)
new = new.with_suffix('.csv')
with p.open() as infile, new.open('wb') as outfile:
fileName = infile.name
with open(fileName, 'rb') as in_file:
data = in_file.read()
# data = str(data, encoding='utf8', errors='ignore')
data = (data.replace(b'\x00', b''))
outfile.write(data)
def removeNULLs():
for p in workspace.glob('*.csv'):
new = Path(workspace, p.name)
new = new.with_suffix('.csv')
with p.open() as infile, new.open('w') as outfile:
fileName = infile.name
with open(fileName, 'r') as in_file:
data = in_file.read()
# data = str(data, encoding='utf8', errors='ignore')
data = (data.replace(u"\u0000", ""))
outfile.write(data)
def removeNull():
for p in workspace.glob('*.csv'):
new = Path(workspace, p.name)
new = new.with_suffix('.csv')
with p.open() as infile, new.open('w') as outfile:
for line in infile.read():
newline = ''.join([i if not u"\u0000" else "" for i in line])
data = (line.replace(line, newline))
outfile.writelines(data)
if __name__ == '__main__':
workspace = Path('/home/')
# removeNULLBytes()
removeNull()
# removeNULLs()