Python 加速语法分析器:HTML进入数据库
我需要插入所有的html标签和属性到数据库中Python 加速语法分析器:HTML进入数据库,python,html,beautifulsoup,Python,Html,Beautifulsoup,我需要插入所有的html标签和属性到数据库中 el.driver.get(url_page) txthtml = el.driver.page_source soup = BeautifulSoup(txthtml, "html.parser") body = soup.find('html') html_parse(body, el, url_page_id, 0, 0, 0,url_page) def html_parse(html, el, url_page_id, level, i,
el.driver.get(url_page)
txthtml = el.driver.page_source
soup = BeautifulSoup(txthtml, "html.parser")
body = soup.find('html')
html_parse(body, el, url_page_id, 0, 0, 0,url_page)
def html_parse(html, el, url_page_id, level, i, parent_id, url_page):
txt = ""
if len(html.text) > 0:
txt = html.text.replace("\n","").replace("\t","").replace("\r","")
ta = tag_list()
ta.p_id = el.id
ta.page_id = url_page_id
ta.level = level
ta.number = i
ta.txt = txt
ta.name = html.name
ta.parent_id = parent_id
ta.html = str(html)
ta.save()
insert_attr(html, el.id, url_page_id, ta.id, url_page)
children = list(html.children)
j = 0
for child in children:
if child.name is None:
continue
j = j + 1
html_parse(child, el, url_page_id, level + 1, j, ta.id, url_page)
当我有递归函数html\u parse时
- html-当前html对象
- el-驱动程序类
- url\u页面\u id-页面的id
- 级别-DOM中的级别
- i-公子号
- 父项id-父项的id
- url\u页面-当前url
- 标记列表-插入当前标记
- insert_attr-插入标签的数据库属性
如何加快代码速度?您的问题是什么?其中一半表示在向数据库中插入值时遇到问题,但问题中没有,另一半则询问如何加快代码速度