Python 使用BeautifulSoup在源代码中获取完整URL_Python - Fatal编程技术网

Python 使用BeautifulSoup在源代码中获取完整URL

python

Python 使用BeautifulSoup在源代码中获取完整URL,python,Python,所以我在看一些源代码，我发现了一些代码 <img src="/gallery/2012-winners-finalists/HM_Watching%20birds2_Shane%20Conklin_MA_2012.jpg" 现在，在源代码中，链接是蓝色的，当您单击它时，它会将您带到图片所在的完整URL，我知道如何使用Beautiful Soup获得Python源代码中显示的内容，我想知道如何在单击源代码中的链接后获得完整URL 编辑：如果给我函数为您执行此操作： >>从urllib.

所以我在看一些源代码，我发现了一些代码

<img src="/gallery/2012-winners-finalists/HM_Watching%20birds2_Shane%20Conklin_MA_2012.jpg"


现在，在源代码中，链接是蓝色的，当您单击它时，它会将您带到图片所在的完整URL，我知道如何使用Beautiful Soup获得Python源代码中显示的内容，我想知道如何在单击源代码中的链接后获得完整URL
编辑：
如果给我函数为您执行此操作：
>>从urllib.parse导入urljoin
>>>基地组织http://example.com/foo/bar.html'
>>>href='/folder/big/a.jpg'
>>>urljoin（基本，href）
'http://example.com/folder/big/a.jpg'

对于Python 2，函数在模块中。
您可以发布html吗？（要加入主机和相对/绝对URL，请参阅：）@user2476540那么a标记中指定的URL是错误的。我上面解释的是浏览器在看到带有前导斜杠的相对URL时的行为。
from bs4 import BeautifulSoup
import requests
import lxml

r = requests.get("http://example.com")

url = r.url  # this is base url
data = r.content  # this is content of page
soup = BeautifulSoup(data, 'lxml')
temp_url = soup.find('a')['href']  # you need to modify this selector

if temp_url[0:7] == "http://" or temp_url[0:8] == "https://" :  # if url have http://
        url = temp_url
else:
        url = url + temp_url


print url  # this is your full url




[mapping]相关文章推荐



                                                        
Mapping OSM、TileCache和Mapnik
mapping 
Mapping Elasticsearch默认映射
mapping 
Mapping 如果容器已经创建并运行，如何使用docker将主机端口分配给容器端口？
mappingdocker 
Mapping 将mysql数据库映射到owl
mapping 
Mapping 自动映射数组中的第一项和其他属性
mappingasp.net-core-mvc 
Elasticsearch 如何为一个类型包含另一个类型的弹性搜索设计映射
mapping 
Mapping 如何创建键值对并将其写入我的badgerDB？
mapping 
Mapping 如何在Dyalog APL中操作多个嵌套数组？
mapping 
Mapping NCL画一张没有画框的地图
mapping 
                                       





随机文章推荐



                                                        
Checkbox 在打开UI时禁用复选框
checkbox 
Checkbox 在<；中设置默认值；s：复选框>；标签
checkboxstruts2 
Checkbox “的目的或意图是什么？”；“实时CD/DVD”；VirtualBox虚拟机中的复选框'；机器设置存储屏幕？
checkboxvirtualbox 
Checkbox 如何使用Vuetify选择多个复选框？
checkboxvue.js 
Checkbox 颤振：复选框已更改
checkboxdartflutter


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
在Windows上使用python 2.6修改python
									Python
							 									Apache
							 
Python 如何将密码传递到金字塔和檐口？
									Python
							 
Python 基于用户配置文件元素更改django haystack搜索视图
									Python
							 									Django
							 
艺术家组织歌曲| python，eyed3
									Python
							 									Directory
							 
Python 表格图像的格式问题
									Python
							 									Matplotlib
							 									Formatting
							 
Python 通过设置X和Y值制作直方图
									Python
							 									Matplotlib
							 
Python django-限制模型访问
									Python
							 									Django
							 									Django Rest Framework
							 
Python Hadoop流-执行二进制应用程序问题的包装器
									Python
							 									Hadoop
							 
Python 如何在Django中在运行时添加新的区域设置？
									Python
							 									Django
							 									Internationalization
							 
Python 错误的html标记
									Python
							 									Html
							 
如何使用python请求发布JSON/xml文件的多部分列表
									Python
							 
Python 使用scrapy刮削多页
									Python
							 									Web Scraping
							 									Scrapy
							 
Python 表单到表单，无法获取要加载的结果页面
									Python
							 									Html
							 									Forms
							 									Web Scraping
							 									Scrapy
							 
无法在ubuntu上运行python3
									Python
							 									Python 2.7
							 									Python 3.x
							 									Ubuntu
							 
Python 使用熊猫数据框如何将计数应用于多级分组列？
									Python
							 									Pandas
							 
使用Python将数字集转换为日期格式
									Python
							 									Datetime
							 
Python （n，）对于numpy数组形状意味着什么？
									Python
							 									Numpy
							 
Python 为什么ureg（0）等于1？
									Python
							 
Python猜谜游戏-while循环只允许整数
									Python
							 
Python 在带有参数的列表中插入函数并使用它们
									Python
							 									Python 3.x
							 									Function
							 
Python Uknown TypeError:“numpy.ndarray”对象不可调用
									Python
							 									Scikit Learn
							 
Python 规范化多值列
									Python
							 									Pandas
							 									Numpy
							 
Python 如何将一行两列的两个列表写入CSV？
									Python
							 									Python 3.x
							 									Csv
							 
Python 将数据帧写入行列表
									Python
							 									Pandas
							 
Python:在循环中附加列表以获得意外结果
									Python
							 									Pandas
							 									Loops
							 									For Loop
							 
Python 在pandas中丢失带有正则表达式的字符串的最后一个字符
									Python
							 									Regex
							 									Pandas
							 
Python 如何将一列中的一部分与另一列中的一部分进行匹配？
									Python
							 									Pandas
							 									Dataframe
							 
Python 基于列值的时间序列总和
									Python
							 									Pandas
							 
Python 编辑数组列表并替换值
									Python
							 									Arrays
							 									List
							 
Python Lambda未发现此类文件或目录的运行时问题
									Python
							 									Json
							 									Amazon Web Services
							 									Aws Lambda
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Ipad
Vb6
Tomcat
Automated Tests
Proxy
Pine Script
Fullcalendar
Wso2
Post
Scikit Learn
Smtp
Cobol
Apache Pig
Wicket
Laravel 5
D
Arangodb
Interface
Autodesk Forge
Tfs
Delphi
Nhibernate
Google Analytics
Bootstrap 4
Image
Qt
Amazon Ec2
Haskell
Rss
Sequelize.js
Gis
Cygwin
Sdk
Arrays
Sparql
Cmd
Join
Flash
Events
Php
C# 3.0
Ubuntu
Cors
Visual Studio 2017
Azure Service Fabric
Odoo
Office365
Joomla
Geometry
Sql Server 2005
Java
Language Agnostic
Excel
Graph
Codeigniter
Azure Data Factory
Jmeter
Asp.net Mvc
Breeze
Botframework
Activerecord
Unicode
Docusignapi
Ios7
Openshift
Loops
Parsing
Memory Management
Ajax
Keycloak
Date
Firefox Addon
Identityserver4
Cucumber
Search
Internet Explorer
Android Layout
Typescript
Collections
Authentication
Protractor
Youtube
Jasper Reports
Google Bigquery
Eclipse Plugin
Jsp
Jqgrid
Sed
Kernel
Linker
Hbase
Couchbase
Openerp
Hash
Pip
Llvm
Python 2.7
Azure Sql Database
Exception Handling
Markdown
Socket.io
Awk
Lotus Notes
Apache Spark
Recursion
Opencv
Amazon Cloudformation
Terminal
Model View Controller
Autocomplete
Vbscript
Optimization
Ffmpeg
Logging
Mdx
Routing
Julia
Xcode
Jestjs
Web Services
Svn
Sublimetext3
Memory
Material Ui
Web
Teamcity
Yii2
Qml
Pointers
Elm
Cordova
Xsd
Sharepoint 2013
Android Fragments
Mercurial
Junit
Migration
Snowflake Cloud Data Platform
Vim
Google Cloud Storage
Terraform
Phpstorm
Android Ndk
Exception
Android Studio
Blazor
Django Models
Virtualbox
Liferay
Axapta
Ruby On Rails 3.2
Google Plus
Azure Ad B2c
Db2
Python
Reporting Services
Xmpp
Opengl
Bison
Node.js
EmptyTag
Menu
Ember.js
Google Compute Engine
Cluster Computing
Emacs
Ios
Kendo Ui
Algorithm
Shiny
Configuration
Vmware
Testing
Facebook
Intellij Idea
Plone
Gps
Twig
Javafx 2
Validation
Mongoose
Nuget
Dynamics Crm 2011
Azure Functions
Url
Yii
Webstorm
Permissions
Sharepoint 2007
Asterisk
Openid
Scheme
Synchronization
Asp.net Mvc 3
Enums
Air
Ibm Mq
Monitoring
Sugarcrm
Antlr4
Isabelle


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网