Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/364.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用BeautifulSoup将所有href刮到列表中_Python_List_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 使用BeautifulSoup将所有href刮到列表中

Python 使用BeautifulSoup将所有href刮到列表中,python,list,web-scraping,beautifulsoup,Python,List,Web Scraping,Beautifulsoup,我想从中获取链接并将其放入列表中 我有以下代码: import bs4 as bs import urllib.request source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/236').read() soup = bs.BeautifulSoup(source,'lxml') links = soup.find_all('a', attrs={'class': 'view'}) print(links)

我想从中获取链接并将其放入列表中

我有以下代码:

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/236').read()
soup = bs.BeautifulSoup(source,'lxml')

links = soup.find_all('a', attrs={'class': 'view'})
print(links)
它产生以下输出:

[<a class="view" href="/en/catalog/view/514">
<img alt="View details" height="32" src="/img/actions/file.png" title="View details" width="32"/>
</a>, 

     """There are 28 lines more"""

      <a class="view" href="/en/catalog/view/565">
<img alt="View details" height="32" src="/img/actions/file.png" title="View details" width="32"/>
</a>]
[,,
“”“还有28行”“”
]
我需要得到以下信息:
[/en/catalog/view/514,,/en/catalog/view/565']

但接下来我继续添加以下内容:
href\u value=links.get('href')
我遇到了一个错误

试试看:

soup = bs.BeautifulSoup(source,'lxml')

links = [i.get("href") for i in soup.find_all('a', attrs={'class': 'view'})]
print(links)
输出:

['/en/catalog/view/514', '/en/catalog/view/515', '/en/catalog/view/179080', '/en/catalog/view/45518', '/en/catalog/view/521', '/en/catalog/view/111429', '/en/catalog/view/522', '/en/catalog/view/182223', '/en/catalog/view/168153', '/en/catalog/view/523', '/en/catalog/view/524', '/en/catalog/view/60228', '/en/catalog/view/525', '/en/catalog/view/539', '/en/catalog/view/540', '/en/catalog/view/31642', '/en/catalog/view/553', '/en/catalog/view/558', '/en/catalog/view/559', '/en/catalog/view/77672', '/en/catalog/view/560', '/en/catalog/view/55377', '/en/catalog/view/55379', '/en/catalog/view/32001', '/en/catalog/view/561', '/en/catalog/view/562', '/en/catalog/view/72185', '/en/catalog/view/563', '/en/catalog/view/564', '/en/catalog/view/565']

您的
链接当前是一个python列表。您要做的是循环到该列表中,并按如下方式获取HREF

final_hrefs = []
for each_link in links:
    final_hrefs.append(each_link.a['href'])
还是一艘班轮

final_hrefs = [each_link['href'] for each_link in links]

print(final_hrefs)

请尝试下面的代码。您只需一步即可获得HTML列表:

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/236').read()
soup = bs.BeautifulSoup(source,'lxml')

links = [i.get("href") for i in soup.find_all('a', attrs={'class': 'view'})]
for link in links:
    print('http://www.gcoins.net'+ link)