美化使用python解析html文件
我正在使用BeautifulSoup将html文件中的所有逗号替换为美化使用python解析html文件,python,beautifulsoup,Python,Beautifulsoup,我正在使用BeautifulSoup将html文件中的所有逗号替换为&sbquo。以下是我的代码: f = open(sys.argv[1],"r") data = f.read() soup = BeautifulSoup(data) comma = re.compile(',') for t in soup.findAll(text=comma): t.replaceWith(t.replace(',', '‚')) 除了html文件中包含一些
&sbquo代码>。以下是我的代码:
f = open(sys.argv[1],"r")
data = f.read()
soup = BeautifulSoup(data)
comma = re.compile(',')
for t in soup.findAll(text=comma):
t.replaceWith(t.replace(',', '‚'))
除了html文件中包含一些javascript外,此代码可以正常工作。在这种情况下,它甚至将javascript代码中的逗号(,)替换为。这不是必需的。我只想替换html文件中的所有文本内容。可以调用:
tags_to_skip = set(["script", "style"])
# Add to this list as needed
def valid_tags(tag):
"""Filter tags on the basis of their tag names
If the tag name is found in ``tags_to_skip`` then
the tag is dropped. Otherwise, it is kept.
"""
if tag.source.name.lower() not in tags_to_skip:
return True
else:
return False
for t in soup.findAll(valid_tags):
t.replaceWith(t.replace(',', '‚'))
您可以拨打电话:
tags_to_skip = set(["script", "style"])
# Add to this list as needed
def valid_tags(tag):
"""Filter tags on the basis of their tag names
If the tag name is found in ``tags_to_skip`` then
the tag is dropped. Otherwise, it is kept.
"""
if tag.source.name.lower() not in tags_to_skip:
return True
else:
return False
for t in soup.findAll(valid_tags):
t.replaceWith(t.replace(',', '‚'))
酷。。太棒了。如何跳过评论?它甚至显示,如果您导入BeautifulSoup,我不需要替换html文件的注释部分;打印BeautifulSoup.\uuuuu版本\uuuuu
,返回的版本号是什么?酷。。太棒了。如何跳过评论?它甚至显示,如果您导入BeautifulSoup,我不需要替换html文件的注释部分;打印BeautifulSoup.\uuuuu版本\uuuuu
,返回的版本号是什么?