Scrape Href python

Scrape Href python,python,beautifulsoup,href,screen-scraping,Python,Beautifulsoup,Href,Screen Scraping,想从某个网站上搜刮citynames。这是我到目前为止编写的相关代码,文本存储在一个变量中。然而,我需要把所有的城市名称放在一个列表中,这似乎对我不起作用。以下是HTML: <a id="ctl00_ContentPlaceHolder1_rptrContinents_ctl00_rptrRows_ctl00_lnkBunker" href="PortDetails.aspx?ElementID=ffd65ee0-93ea-4195-b1ba-a69c8b1908c5">Amster

想从某个网站上搜刮citynames。这是我到目前为止编写的相关代码,文本存储在一个变量中。然而,我需要把所有的城市名称放在一个列表中,这似乎对我不起作用。以下是HTML:

<a id="ctl00_ContentPlaceHolder1_rptrContinents_ctl00_rptrRows_ctl00_lnkBunker" href="PortDetails.aspx?ElementID=ffd65ee0-93ea-4195-b1ba-a69c8b1908c5">Amsterdam</a>

有人能帮忙吗?

说您的html内容存储为:

html_cont = '<a id="ctl00_ContentPlaceHolder1_rptrContinents_ctl00_rptrRows_ctl00_lnkBunker" href="PortDetails.aspx?ElementID=ffd65ee0-93ea-4195-b1ba-a69c8b1908c5">Amsterdam</a>'    

您可以使用
列表理解

>>> html = '<a id="ctl00_ContentPlaceHolder1_rptrContinents_ctl00_rptrRows_ctl00_lnkBunker" href="PortDetails.aspx?ElementID=ffd65ee0-93ea-4195-b1ba-a69c8b1908c5">Amsterdam</a>'
>>> soup = BeautifulSoup(html)
>>> citynames = [names.text for names in soup.find_all('a')]
['Amsterdam']
>html=''
>>>soup=BeautifulSoup(html)
>>>citynames=[names.text用于soup.find_all('a')中的名称]
[“阿姆斯特丹”]

完整的内容在哪里?那么x列是用来做什么的呢?
soup = BeautifulSoup(html_cont, "lxml")

city_names = []
for link in soup.find_all('a', href=True):
    city_names.append(link.text)
>>> html = '<a id="ctl00_ContentPlaceHolder1_rptrContinents_ctl00_rptrRows_ctl00_lnkBunker" href="PortDetails.aspx?ElementID=ffd65ee0-93ea-4195-b1ba-a69c8b1908c5">Amsterdam</a>'
>>> soup = BeautifulSoup(html)
>>> citynames = [names.text for names in soup.find_all('a')]
['Amsterdam']