Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python脚本无误终止_Python_Memory Leaks_Beautifulsoup_Out Of Memory - Fatal编程技术网

Python脚本无误终止

Python脚本无误终止,python,memory-leaks,beautifulsoup,out-of-memory,Python,Memory Leaks,Beautifulsoup,Out Of Memory,我正在运行一个脚本,该脚本下载包含html标记的xls文件,并将它们剥离以创建一个干净的csv文件 代码: 上面的代码对于75KB的文件非常有效,但是对于75MB的文件,进程被终止而没有任何错误 我对beautiful soup和python非常陌生,请帮助我确定问题所在。该脚本在3GB RAM上运行 小文件的输出为: table found row list created soup decomposed file closed writer started

我正在运行一个脚本,该脚本下载包含html标记的xls文件,并将它们剥离以创建一个干净的csv文件

代码:

上面的代码对于75KB的文件非常有效,但是对于75MB的文件,进程被终止而没有任何错误

我对beautiful soup和python非常陌生,请帮助我确定问题所在。该脚本在3GB RAM上运行

小文件的输出为:

table found
row list created
soup decomposed
file closed
writer started
                                types |   # objects |   total size
===================================== | =========== | ============
                                 dict |        5615 |      4.56 MB
                                  str |        8457 |    713.23 KB
                                 list |        3525 |    375.51 KB
  <class 'bs4.element.NavigableString |        1810 |    335.76 KB
                                 code |        1874 |    234.25 KB
              <class 'bs4.element.Tag |        3097 |    193.56 KB
                              unicode |        3102 |    182.65 KB
                                 type |         137 |    120.95 KB
                   wrapper_descriptor |        1060 |     82.81 KB
           builtin_function_or_method |         718 |     50.48 KB
                    method_descriptor |         580 |     40.78 KB
                              weakref |         416 |     35.75 KB
                                  set |         137 |     35.04 KB
                                tuple |         431 |     31.56 KB
                  <class 'abc.ABCMeta |          20 |     17.66 KB
找到
表
已创建行列表
汤腐烂了
文件关闭
作家开始
类型|#对象|总大小
===================================== | =========== | ============
dict | 5615 | 4.56 MB
str | 8457 | 713.23 KB
列表| 3525 | 375.51 KB

很难说没有一个实际的文件可以使用,但是您可以做的是避免创建中间行列表并直接写入打开的
csv
文件

此外,您还可以让
BeautifulSoup
在发动机罩下使用(
lxml
应安装)

改进代码:

#!/usr/bin/env python

from urllib2 import urlopen
import csv

from bs4 import BeautifulSoup    

f = urlopen('http://localhost/Classes/sample.xls')
soup = BeautifulSoup(f, 'lxml')

with open('output_file.csv', 'wb') as file:
    writer = csv.writer(file)

    for row in soup.select('table tr'):
        writer.writerows(val.text.encode('utf8') for val in row.find_all('th') if val)
        writer.writerows(val.text.encode('utf8') for val in row.find_all('td') if val)

soup.decompose()
f.close()
#!/usr/bin/env python

from urllib2 import urlopen
import csv

from bs4 import BeautifulSoup    

f = urlopen('http://localhost/Classes/sample.xls')
soup = BeautifulSoup(f, 'lxml')

with open('output_file.csv', 'wb') as file:
    writer = csv.writer(file)

    for row in soup.select('table tr'):
        writer.writerows(val.text.encode('utf8') for val in row.find_all('th') if val)
        writer.writerows(val.text.encode('utf8') for val in row.find_all('td') if val)

soup.decompose()
f.close()