Python 如何获取html标记?
假设我有这样一个文本文件:Python 如何获取html标记?,python,Python,假设我有这样一个文本文件: <html><head>Headline<html><head>more words </script>even more words</script> <html><head>Headline<html><head>more words </script>even more words</script> <html&
<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>
<html>
<head>
<html>
<head>
</script>
</script>
<html>
<head>
<html>
<head>
</script>
</script>
标题更多单词
更多的话
头条新闻
更多的话
我如何将标签放入这样的列表中:
<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>
<html>
<head>
<html>
<head>
</script>
</script>
<html>
<head>
<html>
<head>
</script>
</script>
我想这就是你想要的:
html_string = ''.join(input_file.readlines())
matches = re.findall('<.*?>', html_string)
for m in matches:
print m
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "<%s>"%tag
def handle_endtag(self, tag):
print "</%s>"%tag
parser = MyHTMLParser();
parser.feed("""<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>
""")
html\u string=''.join(input\u file.readlines())
matches=re.findall(“”,html_字符串)
对于匹配中的m:
打印m
希望这有帮助Python有一个用于此的模块
下面是一些代码,可以满足您的需要:
html_string = ''.join(input_file.readlines())
matches = re.findall('<.*?>', html_string)
for m in matches:
print m
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "<%s>"%tag
def handle_endtag(self, tag):
print "</%s>"%tag
parser = MyHTMLParser();
parser.feed("""<html><head>Headline<html><head>more words
</script>even more words</script>
<html><head>Headline<html><head>more words
</script>even more words</script>
""")
关于SO的讨论应该会有所帮助:这是一个问题的继续吗?如果是的话,你真的应该编辑你的另一个问题,而不是重新发帖。我想你的意思是:re.findall(“”,html_string)@JackNull:你完全正确。额外的双引号是一个打字错误,并已被修复