Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/91.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 美丽的汤<;部门>;及<;p>;编入词典_Python_Html_Parsing_Beautifulsoup - Fatal编程技术网

Python 美丽的汤<;部门>;及<;p>;编入词典

Python 美丽的汤<;部门>;及<;p>;编入词典,python,html,parsing,beautifulsoup,Python,Html,Parsing,Beautifulsoup,我正在为位置存储数据处理一个混乱的HTML部分,并且很难清晰地解析它。我在这里读过其他几篇文章,但都没能成功 下面是来自txt文件的HTML的一部分: " ^ class=""location""> <h2> <a href=""/Locations/AL/5-

我正在为位置存储数据处理一个混乱的HTML部分,并且很难清晰地解析它。我在这里读过其他几篇文章,但都没能成功

下面是来自txt文件的HTML的一部分:

"
                    ^ class=""location"">
                        <h2>
                            <a href=""/Locations/AL/5-Points-In-Line"">5 Points In-Line</a>
                        </h2>

                        <p>
                            2000 Highland Ave S
                            <br/>
                            Birmingham, AL 35205
                            <br/>
                            (205) 930-8000                        
                        </p>
                    </div>
                    ^ class=""location"">
                        <h2>

                            <a href=""/Locations/AL/Airport-Blvd-AL"">Airport Blvd (AL)</a>
                        </h2>

                        <p>
                            4707 Airport Blvd
                            <br/>Mobile, AL 36608
                                <br/>
(251) 461-9933                        </p>
                    </div>
                    ^ class=""location"">
                        <h2>

                            <a href=""/Locations/AL/Alabama-Power"">Alabama Power</a>
                        </h2>

                        <p>
                            600 18th St N
                            <br/>Birmingham, AL 35203
                                <br/>
(205) 257-1688                        </p>
                    </div>
获取密钥错误:“行中有5个点”

我参考了下面类似的文章,但是我无法得到有效的结果,我想我必须解析这些文件


您可以使用
查找下一步()
并将值添加到
dict

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')


output = {}
for tag in soup.select('h2 a'):
    output.setdefault(tag.get_text(), []).append(tag.find_next('p').get_text(strip=True, separator=' '))
    
print(output)
输出:

{'5 Points In-Line': ['2000 Highland Ave S Birmingham, AL 35205 (205) 930-8000'], 'Airport Blvd (AL)': ['4707 Airport Blvd Mobile, AL 36608 (251) 461-9933'], 'Alabama Power': ['600 18th St N Birmingham, AL 35203 (205) 257-1688']}

请正确地重新格式化HTML,这就是我的文件看起来的样子,这救了我。是否有必要在S和伯明翰之间增加一个间隔?
{'5 Points In-Line': ['2000 Highland Ave S Birmingham, AL 35205 (205) 930-8000'], 'Airport Blvd (AL)': ['4707 Airport Blvd Mobile, AL 36608 (251) 461-9933'], 'Alabama Power': ['600 18th St N Birmingham, AL 35203 (205) 257-1688']}