Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/88.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 美化将元素输出到列表_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 美化将元素输出到列表

Python 美化将元素输出到列表,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我有一个使用BeautifulSoup的输出 我需要将“type”“bs4.element.Tag”的输出转换为列表,并将列表导出到名为column_a的DataFrame列中 我希望我的输出在第14个元素处停止(最后三个h2无效) 我的代码: import requests from bs4 import BeautifulSoup url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm' url

我有一个使用BeautifulSoup的输出

  • 我需要将“type”“bs4.element.Tag”的输出转换为列表,并将列表导出到名为column_a的DataFrame列中

  • 我希望我的输出在第14个元素处停止(最后三个
    h2
    无效)

  • 我的代码:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
    url_get = requests.get(url)
    soup = BeautifulSoup(url_get.content, 'html.parser')
    attraction_place=soup.find_all('h2', class_="sitename")    
    
    for attraction in attraction_place:
        print(attraction.text)
        type(attraction)
    
    输出:

    1  Vigeland Sculpture Park
    2  Akershus Fortress
    3  Viking Ship Museum
    4  The National Museum
    5  Munch Museum
    6  Royal Palace
    7  The Museum of Cultural History
    8  Fram Museum
    9  Holmenkollen Ski Jump and Museum
    10  Oslo Cathedral
    11  City Hall (Rådhuset)
    12  Aker Brygge
    13  Natural History Museum & Botanical Gardens
    14  Oslo Opera House and Annual Music Festivals
    Where to Stay in Oslo for Sightseeing
    Tips and Tours: How to Make the Most of Your Visit to Oslo
    More Related Articles on PlanetWare.com
    
    我希望有这样的清单:

    attraction=[Vigeland Sculpture Park, Akershus Fortress, ......]
    
    提前非常感谢。

    new=[]
    计数=1
    对于景点所在地的景点:
    当计数小于15时:
    text=attraction.text
    新增。追加(文本)
    计数+=1
    
    一个简单的方法是使用照片的
    alt
    属性。这将获得干净的文本输出,并且只有14个文本,而不需要切片/索引

    from bs4 import BeautifulSoup
    import requests
    
    r = requests.get('https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm')
    soup = bs(r.content, 'lxml')
    attractions = [item['alt'] for item in soup.select('.photo [alt]')]
    print(attractions)
    
    你可以用切片

    for attraction in attraction_place[:14]:
        print(attraction.text)
        type(attraction)
    

    它只生产了14种元素中的1种!你让我的日子过得很开心。