Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 加速语法分析器:HTML进入数据库_Python_Html_Beautifulsoup - Fatal编程技术网

Python 加速语法分析器:HTML进入数据库

Python 加速语法分析器:HTML进入数据库,python,html,beautifulsoup,Python,Html,Beautifulsoup,我需要插入所有的html标签和属性到数据库中 el.driver.get(url_page) txthtml = el.driver.page_source soup = BeautifulSoup(txthtml, "html.parser") body = soup.find('html') html_parse(body, el, url_page_id, 0, 0, 0,url_page) def html_parse(html, el, url_page_id, level, i,

我需要插入所有的html标签和属性到数据库中

el.driver.get(url_page)
txthtml = el.driver.page_source
soup = BeautifulSoup(txthtml, "html.parser")
body = soup.find('html')
html_parse(body, el, url_page_id, 0, 0, 0,url_page)

def html_parse(html, el, url_page_id, level, i, parent_id, url_page):
    txt = ""
    if len(html.text) > 0:
       txt = html.text.replace("\n","").replace("\t","").replace("\r","")
    ta = tag_list()
    ta.p_id = el.id
    ta.page_id = url_page_id
    ta.level = level
    ta.number = i
    ta.txt = txt
    ta.name = html.name
    ta.parent_id = parent_id
    ta.html = str(html)
    ta.save()
    insert_attr(html, el.id, url_page_id, ta.id, url_page)
    children = list(html.children)
    j = 0
    for child in children:
        if child.name is None:
            continue
        j = j + 1
        html_parse(child, el, url_page_id, level + 1, j, ta.id, url_page)
当我有递归函数html\u parse时

  • html-当前html对象
  • el-驱动程序类
  • url\u页面\u id-页面的id
  • 级别-DOM中的级别
  • i-公子号
  • 父项id-父项的id
  • url\u页面-当前url
  • 标记列表-插入当前标记
  • insert_attr-插入标签的数据库属性
每个html_解析函数运行速度都很快,但完整的html解析在每个大html页面上运行大约4-5分钟


如何加快代码速度?

您的问题是什么?其中一半表示在向数据库中插入值时遇到问题,但问题中没有,另一半则询问如何加快代码速度