Python HTML解析，获取标记名及其值_Python_Python 2.7_Beautifulsoup

Python HTML解析，获取标记名及其值

python python-2.7

Python HTML解析，获取标记名及其值,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我正在Python上使用beautifulsoup。是否有一种方法可以获取属性名及其值，如： name=title value=这是title name=链接值=…/style.css soup.html.head= <meta content="all" name="audience"/> <meta content="2006-2013 webrazzi.com." name="copyright"/> <title> This is title<

我正在Python上使用beautifulsoup。
是否有一种方法可以获取属性名及其值，如：

name=title value=这是title

name=链接值=…/style.css

soup.html.head=

<meta content="all" name="audience"/>
<meta content="2006-2013 webrazzi.com." name="copyright"/>
<title> This is title</title>
<link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>


这是标题

使用

.text

或

.string

属性获取元素的文本内容

使用

.get（'attrname'）

或

['attrname']

获取属性值

html = '''
<head>
    <meta content="all" name="audience"/>
    <meta content="2006-2013 webrazzi.com." name="copyright"/>
    <title> This is title</title>
    <link href=".../style.css" media="screen" rel="stylesheet" type="text/css"/>
</head>
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print('name={} value={}'.format('title', soup.title.text))  # <----
print('name={} value={}'.format('link', soup.link['href'])) # <----

根据OP的评论更新：

def get_text(el): return el.text
def get_href(el): return el['href']

# map tag names to functions (what to retrieve from the tag)
what_todo = {
    'title': get_text,
    'link': get_href,
}
for el in soup.select('head *'): # To retrieve all children inside `head`
    f = what_todo.get(el.name)
    if not f: # skip non-title, non-link tags.
        continue
    print('name={} value={}'.format(el.name, f(el)))

输出：同上

谢谢您的回复。它正在工作。但是我在寻找另一种方法，使用循环一次获取所有值。比如

while（）{print property，value}

@ridvanzoro，那么您需要定义哪些标记应该检索文本内容，哪些标记应该首先检索哪些属性。@ridvanzoro，您是指

meta

标记的

内容属性吗？你在映射中定义了它吗？@ridvanzoro，看起来，在你真正的html中，有些标记没有content
属性。将el['content']
替换为el.get（'content'，'fallback default value'）
将为您提供默认值，而不是引发KeyError。试试看。@ridvanzoro，如果你还有其他问题，请单独提问。这样做，你有更多的机会得到回答。
def get_text(el): return el.text
def get_href(el): return el['href']

# map tag names to functions (what to retrieve from the tag)
what_todo = {
    'title': get_text,
    'link': get_href,
}
for el in soup.select('head *'): # To retrieve all children inside `head`
    f = what_todo.get(el.name)
    if not f: # skip non-title, non-link tags.
        continue
    print('name={} value={}'.format(el.name, f(el)))