Python脚本无误终止_Python_Memory Leaks_Beautifulsoup_Out Of Memory

Python脚本无误终止

python memory-leaks

Python脚本无误终止,python,memory-leaks,beautifulsoup,out-of-memory,Python,Memory Leaks,Beautifulsoup,Out Of Memory,我正在运行一个脚本，该脚本下载包含html标记的xls文件，并将它们剥离以创建一个干净的csv文件代码：上面的代码对于75KB的文件非常有效，但是对于75MB的文件，进程被终止而没有任何错误我对beautiful soup和python非常陌生，请帮助我确定问题所在。该脚本在3GB RAM上运行小文件的输出为： table found row list created soup decomposed file closed writer started

我正在运行一个脚本，该脚本下载包含html标记的xls文件，并将它们剥离以创建一个干净的csv文件

代码：

上面的代码对于75KB的文件非常有效，但是对于75MB的文件，进程被终止而没有任何错误

我对beautiful soup和python非常陌生，请帮助我确定问题所在。该脚本在3GB RAM上运行

小文件的输出为：

table found
row list created
soup decomposed
file closed
writer started
                                types |   # objects |   total size
===================================== | =========== | ============
                                 dict |        5615 |      4.56 MB
                                  str |        8457 |    713.23 KB
                                 list |        3525 |    375.51 KB
  <class 'bs4.element.NavigableString |        1810 |    335.76 KB
                                 code |        1874 |    234.25 KB
              <class 'bs4.element.Tag |        3097 |    193.56 KB
                              unicode |        3102 |    182.65 KB
                                 type |         137 |    120.95 KB
                   wrapper_descriptor |        1060 |     82.81 KB
           builtin_function_or_method |         718 |     50.48 KB
                    method_descriptor |         580 |     40.78 KB
                              weakref |         416 |     35.75 KB
                                  set |         137 |     35.04 KB
                                tuple |         431 |     31.56 KB
                  <class 'abc.ABCMeta |          20 |     17.66 KB

找到

表
已创建行列表
汤腐烂了
文件关闭
作家开始
类型|#对象|总大小
===================================== | =========== | ============
dict | 5615 | 4.56 MB
str | 8457 | 713.23 KB
列表| 3525 | 375.51 KB
很难说没有一个实际的文件可以使用，但是您可以做的是避免创建中间行列表并直接写入打开的csv
文件
此外，您还可以让BeautifulSoup
在发动机罩下使用（lxml
应安装）
改进代码：
#!/usr/bin/env python

from urllib2 import urlopen
import csv

from bs4 import BeautifulSoup    

f = urlopen('http://localhost/Classes/sample.xls')
soup = BeautifulSoup(f, 'lxml')

with open('output_file.csv', 'wb') as file:
    writer = csv.writer(file)

    for row in soup.select('table tr'):
        writer.writerows(val.text.encode('utf8') for val in row.find_all('th') if val)
        writer.writerows(val.text.encode('utf8') for val in row.find_all('td') if val)

soup.decompose()
f.close()

#!/usr/bin/env python

from urllib2 import urlopen
import csv

from bs4 import BeautifulSoup    

f = urlopen('http://localhost/Classes/sample.xls')
soup = BeautifulSoup(f, 'lxml')

with open('output_file.csv', 'wb') as file:
    writer = csv.writer(file)

    for row in soup.select('table tr'):
        writer.writerows(val.text.encode('utf8') for val in row.find_all('th') if val)
        writer.writerows(val.text.encode('utf8') for val in row.find_all('td') if val)

soup.decompose()
f.close()