Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何存储由<;br>;使用Python-BeautifulSoup将其放入单独的数组中?_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

如何存储由<;br>;使用Python-BeautifulSoup将其放入单独的数组中?

如何存储由<;br>;使用Python-BeautifulSoup将其放入单独的数组中?,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,试图搜集攀岩馆的数据。我用的是BeautifulSoup。 我想存储健身房名称、位置、电话号码、链接和描述的数组 以下是示例html: div class="city">Alberta</div> <p><b>Camp He Ho Ha Climbing Gym</b><br> Seba Beach, Alberta, TOE 2BO Canada<br> (780) 429-3277<br> <a

试图搜集攀岩馆的数据。我用的是BeautifulSoup。 我想存储健身房名称、位置、电话号码、链接和描述的数组

以下是示例html:

div class="city">Alberta</div>
<p><b>Camp He Ho Ha Climbing Gym</b><br>
Seba Beach, Alberta, TOE 2BO Canada<br>
(780) 429-3277<br>
<a rel='nofollow' target='_blank' href='http://camphehoha.com/summer-camp/camp-life/'>Camp He Ho Ha Climbing Gym</a><br>
<span class='rt'></span> The Summit is Camp He Ho Ha's 40' climbing gym and ropes course. Facility is available for rent, with safety equipment, orientation to the course and staffing provided.</p>
<div class="city">Calgary</div>
<p><b>Bolder Climbing Community</b><br>
5508 1st Street SE, Calgary, Alberta, Canada<br>
403 988-8140<br>
<a rel='nofollow' target='_blank' href='http://www.bolderclimbing.com/'>Bolder Climbing Community</a><br>
<span class='rt'></span> Calgary's first bouldering specific climbing centre.</p>
阿尔伯塔省 夏令营何何哈攀岩馆
塞巴海滩,阿尔伯塔省,TOE 2BO加拿大
(780)429-3277

这次峰会是何厚铧营40英尺长的攀岩馆和绳索课程。设施可出租,配备安全设备、课程方向和人员配备

卡尔加里 更大胆的攀岩社区
加拿大阿尔伯塔省卡尔加里东南第一街5508号
403988-8140

卡尔加里第一个专门攀岩的攀岩中心


我可以轻松地在每个攀岩馆之间移动,因为它们之间用
隔开,但我感兴趣的单个项目之间用

隔开。如何将这些项目存储到单独的数组中?

您可以这样做。基本上,找到

标记,然后找到它前面的内容

html = '''div class="city">Alberta</div>
<p><b>Camp He Ho Ha Climbing Gym</b><br>
Seba Beach, Alberta, TOE 2BO Canada<br>
(780) 429-3277<br>
<a rel='nofollow' target='_blank' href='http://camphehoha.com/summer-camp/camp-life/'>Camp He Ho Ha Climbing Gym</a><br>
<span class='rt'></span> The Summit is Camp He Ho Ha's 40' climbing gym and ropes course. Facility is available for rent, with safety equipment, orientation to the course and staffing provided.</p>
<div class="city">Calgary</div>
<p><b>Bolder Climbing Community</b><br>
5508 1st Street SE, Calgary, Alberta, Canada<br>
403 988-8140<br>
<a rel='nofollow' target='_blank' href='http://www.bolderclimbing.com/'>Bolder Climbing Community</a><br>
<span class='rt'></span> Calgary's first bouldering specific climbing centre.</p>'''

from bs4 import BeautifulSoup


soup = BeautifulSoup(html, 'html.parser')
final_content = []
ps = soup.find_all('p')
for p in ps:
    content = []
    breaks = p.find_all('br')
    for br in breaks:
        try:
            b = br.previousSibling.strip()
            content.append(b)
        except:
            continue
    final_content.append(content)

请将您的代码发布到
print (final_content)
[['Seba Beach, Alberta, TOE 2BO Canada', '(780) 429-3277'], ['5508 1st Street SE, Calgary, Alberta, Canada', '403 988-8140']]