Python 从一个页面获取所有链接_Python_Web Scraping_Beautifulsoup_Html Parsing - Fatal编程技术网

Python 从一个页面获取所有链接

python web-scraping

Python 从一个页面获取所有链接,python,web-scraping,beautifulsoup,html-parsing,Python,Web Scraping,Beautifulsoup,Html Parsing,我正在使用beautifulsoup从一个页面获取所有链接。我的代码是： import requests from bs4 import BeautifulSoup url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo' r = requests.get(url) html_content = r.text soup = BeautifulSoup(html_content, 'lxml') sou

我正在使用beautifulsoup从一个页面获取所有链接。我的代码是：

import requests
from bs4 import BeautifulSoup


url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo'
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')

soup.find_all('href')

我得到的只是：

[]

如何获取该页面上所有href链接的列表？

您正在告诉

find_all

方法查找

href

标记，而不是属性
您需要找到
标记，它们用于表示链接元素

links = soup.find_all('a')
稍后，您可以访问他们的
href
属性，如下所示：

link = links[0] # get the first link in the entire page url = link['href'] # get value of the href attribute url = link.get('href') # or like this

替换最后一行：

links = soup.find_all('a')
通过这句话：

links = [a.get('href') for a in soup.find_all('a', href=True)]
它将丢弃所有
a
标记，对于每个
a
标记，它将把
href
属性附加到链接列表中

如果您想了解更多有关
[]
之间for循环的信息，请阅读。
以获取每个
href
的列表，而不考虑标记的使用：

href_tags = soup.find_all(href=True) hrefs = [tag.get('href') for tag in href_tags]

但是当我这样做的时候，我只得到了第一个链接：我应该做一个for循环并得到所有链接吗？
links=soup.find_all（'a'）
提供了所有链接的列表。我在答案的底部代码中使用了第一个链接作为示例。然后在
链接
列表上循环yes以访问找到的所有链接。

[web scraping]相关文章推荐

Web scraping 如何在爬网中动态生成起始URL？ web-scraping scrapy web-crawler

Web scraping 登录到一个网站，然后使用Scraping Hub收集数据 web-scraping scrapy

Web scraping 如何在使用HTML单元解析HTML之前清除HTML web-scraping

Web scraping import.io爬虫是否遵守robots.txt？ web-scraping web-crawler

Web scraping 使用python请求删除数据表 web-scraping datatables

Web scraping 无法使用bs4刮取数据 web-scraping

Web scraping Scrapy不加载页面 web-scraping scrapy

Web scraping Web刮取返回空值 web-scraping xpath google-sheets

Web scraping 如何使用Xidel从文件中选择要刮取的行？ web-scraping cmd

Web scraping 需要一个指向下载Youtube频道视频以进行备份的API的指针吗 web-scraping youtube-api

Web scraping 从csv文件加载URL列表，并使用Beautifulsoup刮取标题标记 web-scraping

随机文章推荐

Extjs4 Extjs 4如何获取父组件的id？ extjs4

MVC架构ExtJS4中Dataview的CSS问题 extjs4

Extjs4 Ext.创建已实例化的返回类 extjs4

Extjs4 gridpanel extjs 4的子阵列模型映射问题 extjs4

Extjs4 如何获取某一类型的所有X类型 extjs4

Extjs4 单击事件未从ExtJS 4 MVC中的控制器中注册 extjs4

Extjs4 如何在没有ajax请求的情况下扩展树节点？ extjs4

有没有办法在ExtJS4中禁用treeview的节点 extjs4

Extjs4 ExtJS 4.1:MessageBox 4按钮 extjs4

[python]相关推荐

Python：使用cURL获取重定向URL
Python Curl

Python正则表达式问题
Python For Loop

Python 如何在修改子对象时更新父对象Django模型
Python Django

Python 谷歌分析API-错误：重定向\u uri\u不匹配
Python Google Analytics

Python 烧瓶电子邮件，进程在不存在的电子邮件上暂停
Python Email Flask

Python 数组到元组的转换
Python Arrays

Python Scapy HTTP流量操纵
Python Http

Python Can'；我的RaspberryPi 2无法控制任何伺服
Python Raspberry Pi

Python 如何对多个参数使用_repr__u？
Python Python 3.x

Python 用引号对字符串进行编码/解码
Python

Python按日期筛选dataframe中的行
Python Dataframe

SKLearn和Scipy未使用Python 3.6安装thorugh pip
Python Python 3.x Pip

Python 熊猫聚集
Python Pandas

从文本python检索列表中的引用 Text=input（'请输入您的文本'） l=[str（x）表示Text.split（）中的x] 计数=0 对于l中的项目：对于范围（1，长度（项目））内的i：如果项目[i-1]==项目[i]：计数+=1 如果计数
Python

Python 从文本中提取个人属性
Python Nlp

获取Combobox python的选定值
Python Tkinter Combobox

Python：如何在JSON中转义撇号？
Python Json

Python 尝试将json加载与txt文件一起使用
Python Json Text

Python Selenium显式等待（导入文件夹）
Python Python 3.x Selenium Selenium Webdriver

如何从tweet获取Twitter用户名-PYTHON
Python Twitter

Python DRF一对多序列化--缺少字段的AttributeError
Python Django Django Rest Framework

Python字典：为什么和什么时候我有时不得不使用&引用；有时也会=&引用；在键和值之间？
Python Python 3.x Dictionary

Python Numpy以特定顺序重塑数组
Python Arrays Numpy

Python 如何删除不必要的方括号
Python

Python 循环每行前面的文本
Python Python 3.x Logic

show_img（）函数在python中不起作用
Python Image Opencv

Python 如何对熊猫中特定行的列进行排序？
Python Python 3.x Pandas Dataframe

Python子进程git提交消息仅接受单个字符串
Python Git

python中的矩阵求逆问题：（A⁻；ⁱ；A≠；I）
Python Numpy

为什么我的register函数不能在Python中工作
Python Class Oop

Tags

Jmeter Continuous Integration Windows 7 Maven Quickbooks Git Mfc Arrays Mediawiki Process Swift Bots .net 4.0 Magento2 Raspberry Pi Robotframework Hibernate Lucene Svg Types Windows Phone 8 Linkedin Sdk Visual Studio 2012 Netbeans Validation Android Fragments Mqtt Sockets Installation Clearcase Protocol Buffers Lua File Io Openlayers Gis Wcf Mdx Phpmyadmin Orm Asp.net Core Mvc C# 4.0 Mvvm Java 8 Material Ui Google Plus Actionscript Codenameone Sip Next.js Umbraco Extjs Configuration Network Programming Osgi Migration Google Cloud Dataflow Dependencies Notifications For Loop Orchardcms Racket Webview Bootstrap 4 Vba Corda Ip Airflow Xcode Jpa Multithreading Zurb Foundation Ide Ajax Prometheus Visual C++ Odoo Binary Dart Blackberry Windows 10 Hybris Python 3.x Eclipse Plugin Visual Studio 2013 Pycharm Ckeditor Netty Rest D3.js Directory Ios8 Nativescript Polymer Performance Coldfusion Responsive Design Layout Microsoft Graph Api Ipython Libgdx Dll Dataframe Apache Pig Dojo Entity Framework Core Webrtc Nlp Google Maps Cocos2d X Openlayers 3 Silverlight Reporting Services Xslt Ios Nservicebus Gdb Algorithm Drupal Browser Llvm Authentication Wxpython Design Patterns Actions On Google Keyboard Powerbi Nsis Apache Spark Css Artifactory Url Latex Sql Gnuplot Silverstripe Hazelcast Ocaml Workflow Teradata Login Unity3d Oracle Apex Xamarin.ios Date Extjs4 Jetty Matrix Jqgrid Serialization Here Api Pyspark Asp.net Mvc 4 Cordova Google App Maker File Debian Serial Port Mapbox Ms Word Jsf Service Azure Ad B2c Listview X86 Cobol Passwords Geolocation Jquery Coding Style Anaconda Kentico Monitoring Parallel Processing Join Vector Pascal Three.js Testng Numpy Cookies Navigation Database Design Nuget Ada Mpi Jquery Mobile Binding Gruntjs Sml Chart.js Time Complexity Kubernetes Cmd List Macros Android Emulator Bazel Memory Management Angular6 Jsp

Copyright © 2024. All Rights Reserved by - Fatal编程技术网