使用Python 3从源代码中提取URL_Python_Html_Python 3.x_Html Parsing - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python 3从源代码中提取URL_Python_Html_Python 3.x_Html Parsing - Fatal编程技术网

使用Python 3从源代码中提取URL

python html python-3.x

使用Python 3从源代码中提取URL,python,html,python-3.x,html-parsing,Python,Html,Python 3.x,Html Parsing,我的问题是关于以下问题：如果我不知道确切的URL，只是有一个应该出现在URL中的关键字怎么办？那么如何从页面源中提取url呢？尝试使用正则表达式 import re re.findall(r'(?i)href=["\']([^\s"\'<>]+)', content) 重新导入 re.findall（r'（？i）href=[“\'”]（[^\s“\']+]），内容）使用HTML解析器在的情况下，可以将作为关键字参数值传递： from bs4 import Beautiful

我的问题是关于以下问题：

如果我不知道确切的URL，只是有一个应该出现在URL中的关键字怎么办？那么如何从页面源中提取url呢？

尝试使用正则表达式

import re
re.findall(r'(?i)href=["\']([^\s"\'<>]+)', content)

重新导入
re.findall（r'（？i）href=[“\'”]（[^\s“\']+]），内容）

使用HTML解析器
在的情况下，可以将作为关键字参数值传递：

from bs4 import BeautifulSoup word = "test" data = "your HTML here" soup = BeautifulSoup(data) for a in soup.find_all('a', href=lambda x: x and word in x): print(a['href'])
或者，a：
或者，使用：

嗯。。。将它们全部提取出来，然后依次检查。
import re for a in soup.find_all('a', href=re.compile(word)): print(a['href'])

for a in soup.select('a[href^="{word}"]'.format(word=word)): print(a['href'])

[html]相关文章推荐

Html Chrome：打印预览不同于模拟CSS媒体打印 html css google-chrome printing

Html 使用CSS的锥形div html css

iPad4 Safari上的Html5视频元素 html ipad video safari

Html 仅使用CSS清除输入值 html css

Html 外部样式表的奇怪行为 html css

Html rel="；发电机“；标签在Wordpress中是什么意思？ html wordpress

Html 防止用户返回并查看以前提交的表单Rails html ruby-on-rails-3

Html 如何在空格键中创建自定义块辅助对象？ html meteor

Html Bootstrap 3-元素赢得'；Don’不要呆在沙发的宽度内 html css twitter-bootstrap-3

Html 将两个单词相邻放置 html css

Html 如何在Firefox的登录/密码字段中使autocomplete=off工作 html firefox autocomplete passwords

Html 如何指定单元格'；包含输入字段的表格中的宽度？ html css

Html 用sed查找并替换大型文本文件中的通配符字符串 html text sed

Html 如何在css中更改btn默认值的颜色 html css twitter-bootstrap

Html CSS位置响应设计移动视图 html css

Html Flexbox定位IE11 html css

Html 覆盖css中的显示 html css twitter-bootstrap

Html Select标记中的选项是否可以携带多个值？ html

Html 如何在Flexbox中禁用等高列？ html css

Html 特殊字符未按预期显示 html utf-8

随机文章推荐

Pointers c++/cli插入符号^Vs.指针*？我在C和C中编程了很长时间，我认为移动到C++会很流畅……BR> 无论如何，我见过插入符号的用法，但我不明白它的意思 pointers c++-cli

Pointers 在Ada中实现具有访问类型的抽象函数 pointers ada

Pointers 字节数组指针 pointers vb6

Pointers 如何将查询结果传递给函数？ pointers

Pointers C编译器中错误的公共表达式替换？ pointers gcc compiler-construction

Pointers 为什么在windows phone开发中存在^* pointers windows-phone-8

Pointers 指针赋值期间fortran 90预期边界规范 pointers

Pointers 如何访问传递给函数的底层数组，该函数在Go中需要一个空接口？ pointers types interface go

Pointers 将父对象指定给组件或使“对象”成为超级对象？ pointers reference

Pointers 从const获取地址 pointers

Pointers C++/CLI如何判断句柄是否为'；不要指向任何物体在正常C++中，如果指针没有指向任何对象，则指针用null表示。 class* object1 = NULL; //NULL is a special value that indicates //that the pointer is not pointing to any object. if(object1 == NULL) { cout << " pointers visual-c++c++-cli

Pointers 我可以使用一些指针位（x86_64）来定制数据吗？如果可能的话怎么办？ pointers

Pointers 将未初始化的指针指定给指向操作系统存储位置的对象会破坏计算机还是使其崩溃？ pointers

Pointers ASM中的PTR指令，它是如何工作的？ pointers assembly x86

Pointers 我是否可以返回一个结构，它使用trait实现中的PhantomData向原始指针添加生存期，而不会污染接口？ pointers memory-management rust

Pointers 如何使用reflect读取指向切片的指针的值？ pointers go

Pointers 常量int和指针示例 pointers

Pointers 插件包中的查找返回什么？ pointers go plugins

[python]相关推荐

Tags

Network Programming Quickbooks Openstack Listview D3.js Ios Com Drop Down Menu Shopify Push Notification Scrapy Oracle Sharepoint Udp .net Core Sugarcrm Opengl Es Reflection Devexpress Resharper Database Design Mapreduce Image Processing Compression Gwt Debugging Jsf Clearcase Azure Devops Docker Sql Server 2005 Nunit Mule Content Management System Perforce Recursion Ftp Log4j Javascript Command Line Yii2 Snmp Templates Pentaho Ravendb Html5 Canvas Couchbase Ios8 Apache Nifi Actionscript 3 Passwords Arangodb Terminal Unit Testing Codenameone Drupal 7 Animation Ipad Swift2 Junit Mapping Firefox Addon Primefaces Jqgrid Inheritance Replace Sip Utf 8 Azure Ad B2c Loops Mongoose Encoding C++ Cli Core Data Yocto Types For Loop Orientdb Plone If Statement Safari Libgdx Mfc Swing Properties Sencha Touch 2 Rest Stata Sparql Jakarta Ee Logic Mongodb Acumatica Stanford Nlp C++ Http Sql Server Sbt Eclipse Excel Formula Matplotlib Azure Cosmosdb Coldfusion Verilog Keycloak Seo Database Charts Generics Windows 8 Swiftui Latex Audio Routing Usb Google Sheets Cluster Computing Programming Languages Jestjs Puppet 3d Grid Isabelle Markdown Aws Lambda Unity3d Clang Twitter Bootstrap Bison Service Ibm Mobilefirst Office365 Eclipse Rcp Google Drive Api Sqlalchemy Jasper Reports Doctrine Orm Cuda Electron Qt Macros Odoo Silverstripe Phantomjs Sapui5 Antlr Redis Notepad++ Google Cloud Firestore Sharepoint 2007 Project Management Mips D Printing Spotify Mvvm Sprite Kit Spring Mvc Iphone Snowflake Cloud Data Platform Formatting Robotframework Terraform Parsing Ms Access EmptyTag Ldap Erlang Rspec Tableau Api Download Lua Pandas Openerp Cookies Geometry Model View Controller Design Patterns File Upload Networking List Ssh Pascal Ssas Typo3 Math Air Cygwin Amazon Ec2 Wso2 Arduino Mercurial Module Fluent Nhibernate Active Directory Syntax Gradle Visual Studio 2013 Ip Vim Openssl

Copyright © 2024. All Rights Reserved by - Fatal编程技术网