用python3编写大型文本文件

用python3编写大型文本文件,python,Python,虽然我看过一些关于这个主题的文献,但我不太了解如何实现一个代码块,它可以编写大型文本文件而不会崩溃 据我所知,这应该是逐行完成的,但是从我所看到的实现来看,这只适用于已经存在的文件,相反,我希望在循环的每次迭代中在块中创建和写入文件 这是代码块,它由一个try-catch包围: fileW = open(str(articleDate.title)+"-WC.txt", 'wb') fileW.write(getText.encode('utf-8', errors='replace').str

虽然我看过一些关于这个主题的文献,但我不太了解如何实现一个代码块,它可以编写大型文本文件而不会崩溃

据我所知,这应该是逐行完成的,但是从我所看到的实现来看,这只适用于已经存在的文件,相反,我希望在循环的每次迭代中在块中创建和写入文件

这是代码块,它由一个try-catch包围:

fileW = open(str(articleDate.title)+"-WC.txt", 'wb')
fileW.write(getText.encode('utf-8', errors='replace').strip()+ str(articleDate.publish_date).encode('utf-8').strip())
fileW.close()
我知道我需要另一种写入文件的方式的原因是,我看到不断出现此异常,不断弹出的“chunks”关键字表明write方法无法处理文本量:

    File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 546, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 513, in _read_next_chunk_size
    return int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 563, in _readall_chunked
    chunk_left = self._get_chunk_left()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 548, in _get_chunk_left
    raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "webcrawl.py", line 102, in <module>
    writeFiles()
  File "webcrawl.py", line 83, in writeFiles
    extractor = Extractor(extractor='ArticleExtractor', url=urls)
  File "/Users/Adrian/anaconda3/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 39, in __init__
    connection  = urllib2.urlopen(request)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 564, in error
    result = self._call_chain(*args)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 753, in http_error_302
    fp.read()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 456, in read
    return self._readall_chunked()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 570, in _readall_chunked
    raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(0 bytes read)

虽然我知道底部的异常名称通常是由于库名称“httplibs”从python 2更改为“urllibs”而出现的,但是我使用的Package是兼容python 3的,因此我相当确定这是一个编写问题,希望您能提供帮助

您可以使用上下文管理器确保在每次操作结束时关闭文件:

import contextlib
@contextlib.contextmanager
def write_to(filename, ops = 'a'):  
    f = open(filename, ops)
    yield f
    f.close()

for chunk in data:
  with write_to('filename.txt') as f:
     f.write(chunk)

您的代码和问题标题是关于写入文件的。但您的异常是由读取HTTP响应引起的!这两者根本没有关系@phihag你是对的,但我指的是在http响应处理程序中的_get_chunk_,这似乎与传入的站点数据也被删除有关big@phihag实际上,对获取网络数据的线路进行尝试也提供了一个解决方案,同时也感谢您指出这一点。@AdrianCoutsoftides很乐意提供帮助!如果这个答案帮助你,请考虑接受它。非常感谢。