Python 美化将元素输出到列表_Python_Html_Web Scraping_Beautifulsoup

Python 美化将元素输出到列表

python html web-scraping

Python 美化将元素输出到列表,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我有一个使用BeautifulSoup的输出我需要将“type”“bs4.element.Tag”的输出转换为列表，并将列表导出到名为column_a的DataFrame列中我希望我的输出在第14个元素处停止（最后三个h2无效）我的代码： import requests from bs4 import BeautifulSoup url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm' url

我有一个使用BeautifulSoup的输出

我需要将“type”“bs4.element.Tag”的输出转换为列表，并将列表导出到名为column_a的DataFrame列中

我希望我的输出在第14个元素处停止（最后三个

h2

无效）

我的代码：

import requests
from bs4 import BeautifulSoup


url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
attraction_place=soup.find_all('h2', class_="sitename")    

for attraction in attraction_place:
    print(attraction.text)
    type(attraction)

输出：

1  Vigeland Sculpture Park
2  Akershus Fortress
3  Viking Ship Museum
4  The National Museum
5  Munch Museum
6  Royal Palace
7  The Museum of Cultural History
8  Fram Museum
9  Holmenkollen Ski Jump and Museum
10  Oslo Cathedral
11  City Hall (Rådhuset)
12  Aker Brygge
13  Natural History Museum & Botanical Gardens
14  Oslo Opera House and Annual Music Festivals
Where to Stay in Oslo for Sightseeing
Tips and Tours: How to Make the Most of Your Visit to Oslo
More Related Articles on PlanetWare.com

我希望有这样的清单：

attraction=[Vigeland Sculpture Park, Akershus Fortress, ......]

提前非常感谢。

new=[]
计数=1
对于景点所在地的景点：
当计数小于15时：
text=attraction.text
新增。追加（文本）
计数+=1

一个简单的方法是使用照片的

alt

属性。这将获得干净的文本输出，并且只有14个文本，而不需要切片/索引

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm')
soup = bs(r.content, 'lxml')
attractions = [item['alt'] for item in soup.select('.photo [alt]')]
print(attractions)

你可以用切片

for attraction in attraction_place[:14]:
    print(attraction.text)
    type(attraction)

它只生产了14种元素中的1种！你让我的日子过得很开心。