Python-如何读取带有NUL分隔行的文件？_Python_Nul

Python-如何读取带有NUL分隔行的文件？

python

Python-如何读取带有NUL分隔行的文件？,python,nul,Python,Nul,我通常使用以下Python代码从文件中读取行： f = open('./my.csv', 'r') for line in f: print line 但是如果文件是由“\0”（而不是“\n”）分隔的行，该怎么办？是否有一个Python模块可以处理这个问题谢谢您的建议。如果您的文件足够小，您可以将其全部读入内存，则可以使用拆分： for line in f.read().split('\0'): print line 否则，您可能希望尝试以下讨论中的配方：我还注意到你的

我通常使用以下Python代码从文件中读取行：

f = open('./my.csv', 'r')
for line in f:
    print line

但是如果文件是由“\0”（而不是“\n”）分隔的行，该怎么办？是否有一个Python模块可以处理这个问题

谢谢您的建议。

如果您的文件足够小，您可以将其全部读入内存，则可以使用拆分：

for line in f.read().split('\0'):
    print line

否则，您可能希望尝试以下讨论中的配方：

我还注意到你的文件有一个“csv”扩展名。Python中内置了一个CSV模块（导入CSV）。有一个名为的属性，但它当前未在读取器中实现：

方言。行终止符
用于终止写入程序生成的行的字符串。它默认为“\r\n”
注意：读卡器硬编码为将“\r”或“\n”识别为行尾，并忽略行终止符。这种行为将来可能会改变

我已经修改了MarkByers的建议，以便我们可以在Python中读取带有NUL分隔行的行文件。这种方法逐行读取一个可能较大的文件，并且应该更高效地使用内存。以下是Python代码（带注释）：

希望有帮助。
我的文件可能有几千到几万行。@user1129812:一行有多长？100字节？100字节*50000行=大约5个字节，每行大约100个字符。假设unicode，每行大约有200个字节，一个50000行的文件大约有200 x 50000=9.54 MB。根据链接中的
msg11453
，通过awk预处理文件以将NUL字符更改为“\n”可能是一种替代解决方案（如果文件内容中不包含“\n”）。谢谢。
fileLineIter
不太正确：如果
partialLine
中的最后一个字符是
inputNewLine
，它将丢失，因为
'a | b |'。 def fileLineIter(inputFile, inputNewline="\n", outputNewline=None, readSize=8192): """Like the normal file iter but you can set what string indicates newline. The newline string can be arbitrarily long; it need not be restricted to a single character. You can also set the read size and control whether or not the newline string is left on the end of the iterated lines. Setting newline to '\0' is particularly good for use with an input file created with something like "os.popen('find -print0')". """ if outputNewline is None: outputNewline = inputNewline partialLine = '' while True: charsJustRead = inputFile.read(readSize) if not charsJustRead: break partialLine += charsJustRead lines = partialLine.split(inputNewline) partialLine = lines.pop() for line in lines: yield line + outputNewline if partialLine: yield partialLine import sys # Variables for "fileReadLine()" inputFile = sys.stdin # The input file. Use "stdin" as an example for receiving data from pipe. lines = [] # Extracted complete lines (delimited with "inputNewline"). partialLine = '' # Extracted last non-complete partial line. inputNewline="\0" # Newline character(s) in input file. outputNewline="\n" # Newline character(s) in output lines. readSize=8192 # Size of read buffer. # End - Variables for "fileReadLine()" # This function reads NUL delimited lines sequentially and is memory efficient. def fileReadLine(): """Like the normal file readline but you can set what string indicates newline. The newline string can be arbitrarily long; it need not be restricted to a single character. You can also set the read size and control whether or not the newline string is left on the end of the read lines. Setting newline to '\0' is particularly good for use with an input file created with something like "os.popen('find -print0')". """ # Declare that we want to use these related global variables. global inputFile, partialLine, lines, inputNewline, outputNewline, readSize if lines: # If there is already extracted complete lines, pop 1st llne from lines and return that line + outputNewline. line = lines.pop(0) return line + outputNewline # If there is NO already extracted complete lines, try to read more from input file. while True: # Here "lines" must be an empty list. charsJustRead = inputFile.read(readSize) # The read buffer size, "readSize", could be changed as you like. if not charsJustRead: # Have reached EOF. if partialLine: # If partialLine is not empty here, treat it as a complete line and copy and return it. popedPartialLine = partialLine partialLine = "" # partialLine is now copied for return, reset it to an empty string to indicate that there is no more partialLine to return in later "fileReadLine" attempt. return popedPartialLine # This should be the last line of input file. else: # If reached EOF and partialLine is empty, then all the lines in input file must have been read. Return None to indicate this. return None partialLine += charsJustRead # If read buffer is not empty, add it to partialLine. lines = partialLine.split(inputNewline) # Split partialLine to get some complete lines. partialLine = lines.pop() # The last item of lines may not be a complete line, move it to partialLine. if not lines: # Empty "lines" means that we must NOT have finished read any complete line. So continue. continue else: # We must have finished read at least 1 complete llne. So pop 1st llne from lines and return that line + outputNewline (exit while loop). line = lines.pop(0) return line + outputNewline # As an example, read NUL delimited lines from "stdin" and print them out (using "\n" to delimit output lines). while True: line = fileReadLine() if line is None: break sys.stdout.write(line) # "write" does not include "\n". sys.stdout.flush()