使用Python(BeautifulSoup?)将ID标记添加到HTML标记中
我有一个包含特定标记的HTML文件,我需要以使用Python(BeautifulSoup?)将ID标记添加到HTML标记中,python,html,beautifulsoup,Python,Html,Beautifulsoup,我有一个包含特定标记的HTML文件,我需要以ID=“rule\u 1”,ID=“rule\u 1.1”,ID=“rule\u 1.2”,ID=“rule\u 1.2.1”等格式向每个标记添加ID号。例如,当前的HTML是: <div style="styles"> <p class="classname">TEXT</p> <p class="classname">TEXT</p> <ul style="s
ID=“rule\u 1”
,ID=“rule\u 1.1”
,ID=“rule\u 1.2”
,ID=“rule\u 1.2.1”
等格式向每个标记添加ID号。例如,当前的HTML是:
<div style="styles">
<p class="classname">TEXT</p>
<p class="classname">TEXT</p>
<ul style="styles">
<li>
<p class="classname">TEXT</p>
</li>
<li>
<p class="classname">TEXT</p>
</li>
</ul>
</div>
但这只是添加了
id=“1”
,id=“2”
,等等。我怎样才能让它像1
、1.1
、1.1.1
这样交错排列 没关系,我想出来了:
curr_tags = {}
for each_tag in html_tags:
if html_tags.index(each_tag) == 0:
each_tag.attrs['id'] = 'rule_1'
else:
parent_id = each_tag.parent.attrs['id']
if parent_id in curr_tags.keys():
curr_tags[parent_id] += 1
else:
curr_tags[parent_id] = 1
each_tag.attrs['id'] = parent_id + '.{0}'.format(curr_tags[parent_id])
from bs4 import BeautifulSoup as html_parser
with open('outputs/HTML/{}.html'.format(deal), 'r') as read_file:
html_source = read_file.read()
soup = html_parser(html_source, 'html.parser')
html_tags = soup.find_all(['div', 'p', 'span', 'ul', 'li'])
for each_tag in html_tags:
each_tag.attrs['id'] = html_tags.index(each_tag)
with open('outputs/HTML/{}-id.html'.format(deal), 'w') as save_file:
save_file.write(str(soup))
curr_tags = {}
for each_tag in html_tags:
if html_tags.index(each_tag) == 0:
each_tag.attrs['id'] = 'rule_1'
else:
parent_id = each_tag.parent.attrs['id']
if parent_id in curr_tags.keys():
curr_tags[parent_id] += 1
else:
curr_tags[parent_id] = 1
each_tag.attrs['id'] = parent_id + '.{0}'.format(curr_tags[parent_id])