Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-如何在多个标记之间提取元素_Python_Html_Python 3.x_Beautifulsoup_Extract - Fatal编程技术网

Python-如何在多个标记之间提取元素

Python-如何在多个标记之间提取元素,python,html,python-3.x,beautifulsoup,extract,Python,Html,Python 3.x,Beautifulsoup,Extract,工作HTML: <h2> Heading 1 </h2> <h3> Subheading 1.1 </h3> <a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a> <h3> Subheading 1.2 </h3> <a href="#">Link

工作HTML:

<h2> Heading 1 </h2>
<h3> Subheading 1.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a>
<h3> Subheading 1.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a> | <a href="#">Link 4</a>
<h3> Subheading 1.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 2 </h2>
<h3> Subheading 2.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2</a>
<h3> Subheading 2.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a>
<h3> Subheading 2.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 3 </h2>
soup = BeautifulSoup("""<h2> Heading 1 </h2>
<h3> Subheading 1.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a>
<h3> Subheading 1.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a> | <a href="#">Link 4</a>
<h3> Subheading 1.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 2 </h2>
<h3> Subheading 2.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2</a>
<h3> Subheading 2.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a>
<h3> Subheading 2.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 3 </h2>""", 'html5lib')

for row in soup.find_all("h2"):
    print(row.text)
    print(row.find_next('h3'))
    print('################')
################
 Heading 1 
<h3> Subheading 1.1 </h3>
################
 Heading 2 
<h3> Subheading 2.1 </h3>
################
 Heading 3 
None
################
################
Heading 1 
Subheading 1.1
Link 1
Link 2
Link 3
--------
Subheading 1.2 
Link 1
Link 2
Link 3
Link 4
--------
Subheading 1.3 
Link 1
################
Heading 2 
Subheading 2.1 
Link 1
Link 2
--------
Subheading 2.2 
Link 1
Link 2
--------
Subheading 2.3 
Link 1
################
当前结果:

<h2> Heading 1 </h2>
<h3> Subheading 1.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a>
<h3> Subheading 1.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a> | <a href="#">Link 4</a>
<h3> Subheading 1.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 2 </h2>
<h3> Subheading 2.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2</a>
<h3> Subheading 2.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a>
<h3> Subheading 2.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 3 </h2>
soup = BeautifulSoup("""<h2> Heading 1 </h2>
<h3> Subheading 1.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a>
<h3> Subheading 1.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a> | <a href="#">Link 4</a>
<h3> Subheading 1.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 2 </h2>
<h3> Subheading 2.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2</a>
<h3> Subheading 2.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a>
<h3> Subheading 2.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 3 </h2>""", 'html5lib')

for row in soup.find_all("h2"):
    print(row.text)
    print(row.find_next('h3'))
    print('################')
################
 Heading 1 
<h3> Subheading 1.1 </h3>
################
 Heading 2 
<h3> Subheading 2.1 </h3>
################
 Heading 3 
None
################
################
Heading 1 
Subheading 1.1
Link 1
Link 2
Link 3
--------
Subheading 1.2 
Link 1
Link 2
Link 3
Link 4
--------
Subheading 1.3 
Link 1
################
Heading 2 
Subheading 2.1 
Link 1
Link 2
--------
Subheading 2.2 
Link 1
Link 2
--------
Subheading 2.3 
Link 1
################
或者类似的东西这很管用

s = """

<h2> Heading 1 </h2>
<h3> Subheading 1.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a>
<h3> Subheading 1.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a> | <a href="#">Link 3</a> | <a href="#">Link 4</a>
<h3> Subheading 1.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 2 </h2>
<h3> Subheading 2.1 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2</a>
<h3> Subheading 2.2 </h3>
<a href="#">Link 1</a> | <a href="#">Link 2 </a>
<h3> Subheading 2.3 </h3>
<a href="#">Link 1</a>
<h2> Heading 3 </h2>

"""

from bs4 import BeautifulSoup as bs

soup = bs(s)

for i in soup.find_all('h2'):
    print i.text
    for j in i.next_siblings:
        if j.name == 'h2': break
        if j.name == 'h3':
            print '\t'+j.text
            for k in j.next_siblings:
                if k.name == 'h3': break
                if k.name == 'a':
                    print '\t\t'+k.text
s=”“”
标题1
第1.1子目
|  | 
第1.2子目
|  |  | 
第1.3子目
标题2
第2.1子目
| 
第2.2子目
| 
第2.3子目
标题3
"""
从bs4导入BeautifulSoup作为bs
汤=bs(s)
因为我在汤里。找到所有的('h2'):
打印i.text
对于i.next_兄弟姐妹中的j:
如果j.name='h2':中断
如果j.name='h3':
打印'\t'+j.text
对于k in j.next_兄弟姐妹:
如果k.name='h3':中断
如果k.name='a':
打印'\t\t'+k.text