python RE findall（）返回值是一个完整的字符串_Python_Html_Regex_Web Crawler - Fatal编程技术网

python RE findall（）返回值是一个完整的字符串

python html regex web-crawler

python RE findall（）返回值是一个完整的字符串,python,html,regex,web-crawler,Python,Html,Regex,Web Crawler,我正在编写一个爬虫程序来获取html文件的某些部分。但是我不知道如何使用re.findall（）这里有一个例子，当我想找到所有。。。在文件的一部分，我可以这样写： re.findall("<div>.*\</div>", result_page) re.findall（'%1！'，结果页面）如果结果页面是字符串“，则结果将为 ['<div> </div> <div> </div>'] [“”] 只有整个字符串。这

我正在编写一个爬虫程序来获取html文件的某些部分。但是我不知道如何使用re.findall（）

这里有一个例子，当我想找到所有。。。在文件的一部分，我可以这样写：

re.findall("<div>.*\</div>", result_page)

re.findall（'%1！'，结果页面）

如果结果页面是字符串

“

，则结果将为

['<div> </div> <div> </div>']

[“”]

只有整个字符串。这不是我想要的，我期待两个div分开。我该怎么办？

引用

“*”

、

“+”

和

“？”

限定符都是贪婪的；他们也很般配尽可能多的文本。在限定符之后添加

“？”

，使其执行以非贪婪或最小的方式匹配；尽可能少的字符将匹配

只需添加一个问号：

In [6]: re.findall("<div>.*?</div>", result_page)
Out[6]: ['<div> </div>', '<div> </div>']

[6]中的

：关于findall（“.*”，结果页）
输出[6]：[''，]

另外，您不应该使用正则表达式来解析HTML，因为有专门为此设计的HTML解析器。示例使用：

[7]中的

：导入bs4
[8]中：[str（tag）表示bs4.BeautifulSoup中的tag（结果页面）（'div'）]
出[8]：[''，]

是一个运算符，您希望使用

*？

进行非贪婪的匹配

re.findall("<div>.*?</div>", result_page)

为什么我不应该使用正则表达式来解析HTML？正确的方法是什么？@alvinzoo总是有HTML解析器，例如Python的Beautiful Soup。您可能需要阅读。如果下面的答案之一解决了您的问题，您应该接受它。
re.findall("<div>.*?</div>", result_page)

from bs4 import BeautifulSoup soup = BeautifulSoup(html) soup.find_all('div')

[html]相关文章推荐

HTML表单文本区域问题 html forms login

Html 背景图像大小未调整到指定的宽度&；高度 html css

Html IE9、Opera和旧版Safari是否尝试将我的Node.js服务器托管站点作为文件下载？ html css node.js

Html 锚定标签不工作-不确定原因 html

Html 通用XSLT转换代码 html xml xslt

XPages中number类型的HTML字段不接受小数 html xpages

Html Sass使用@at root html css sass

Html `hover`伪类不工作 html css

Html 如何通过鼠标滚动asp.net mvc更改base64 html asp.net-mvc

Html AngularJs数组未绑定到模板 html angularjs binding

Html 指令模板中的动态ui sref html angularjs

Html 同时使用多个css类来<；部门></部门>； html css

Html Bootstrap4卡，在卡图像上覆盖一个按钮？ html css twitter-bootstrap bootstrap-4

Html 如何在没有大空间的情况下将div放置在彼此下方 html

Html chrome autocomplete的工作原理很奇怪 html autocomplete

Html Css删除h标记中的空白 html css

Html 柔性：如何右对齐以包裹为中心的项目 html css

Html 在某些情况下隐藏角度指令的元素 html angular

Visual Studio工具箱项不可用我是一个网络开发人员，对VisualStudio和C++是新手。我想创建浏览器来呈现我的HTML5游戏，但我的工具箱没有显示在VisualStudio中。有人能告诉我这是怎么发生的，为什么发生的吗 html c++visual-studio browser

Html 输入日期占位符 html css

随机文章推荐

PHPStorm don'；不要在后台运行搜索 phpstorm

更改PhpStorm 10中匹配大括号的背景色 phpstorm

在PhpStorm中禁用HTML标记中隐藏样式参数的内容 phpstorm

在PhpStorm的版本控制部分启用控制台选项卡 phpstorm

Phpstorm 安装PHP代码嗅探器时遇到问题 phpstorm

左移+；Ctrl+；退格在PhpStorm中停止工作 phpstorm

Phpstorm WebStorm对PostSS简单变量的支持 phpstorm webstorm

PhpStorm每20分钟崩溃一次 phpstorm

如果运行phpcs，PhpStorm将抛出错误 phpstorm

[python]相关推荐

Python 改进这段代码的建议？
Python

Python 多项式与numpy.convalve相乘返回错误结果
Python Numpy

使用python将对象输出到文件
Python File Io

通过另一个Python文件访问Python时Pygame图像加载错误
Python Path

Python for ArcGIS-使用文本字段作为参数计算字段
Python

Python sqlalchemy对象已保存以完成操作
Python

Python 如何减少对数据库的查询？
Python Django

Python 组合框是如何工作的？
Python

从文本文件中提取文本的Python程序？
Python Regex Python 2.7

Python2.7.7中断？
Python If Statement

需要添加10次重复循环的python投币程序
Python Python 3.x

csv文件行Python中的SQL注入关键字数
Python Csv

与用python编写的计时器搏斗？
Python Python 2.7 Timer

python按不同标准对元组排序
Python List Sorting

Python继承。无法读取该属性
Python

Python openCV3：在opencv中对图像进行形态学变换时未获得预期输出
Python Opencv

Python 根据其他数组的值选择numpy.ndarray的子集
Python Arrays Numpy

Python Django Rest框架Mongoengine-PUT/PATCH vs GET for ReferenceField
Python Django Django Rest Framework

Python 我是否可以就按钮垫片上的led为特定颜色做出if声明？
Python Button

Python 使用pandas可以快速获得时间序列数据的正确聚合输出吗？
Python Pandas

如何在python中使用退出代码生成报告
Python Python 3.x

Python 函数不存储所有输出
Python Arrays List Function

在Python中从字符串中删除奇怪的隐藏字符
Python String Pandas Csv Encoding

使用2格式Python更改类型列
Python Pandas

Python 有没有一种方法可以在VisualStudio代码中可视化pickle文件？
Python Visual Studio Code

Python 这是什么样的清单，我该如何阅读？
Python Discord.py

Python正则表达式将日期提取到Dataframe中的新列
Python Python 3.x Regex Pandas

尝试更改python中键的值时出现运算符错误
Python List Dictionary

Python Discord.py-机器人启动时加载公会前缀（MySQL）
Python Discord.py

Python discord.ext.commands.errors.CommandInvokeError:命令引发异常：AttributeError:'；Bot&x27；对象没有属性'；CogName'；
Python Python 3.x Discord Discord.py

Tags

Asp.net Mvc 5 Ios6 Chef Infra Macos Mapbox Imagemagick .net Core C++ Azure Ad B2c Quickbooks Jquery Ui Validation Tree Search Protractor Php Colors Facebook Common Lisp Cassandra Neural Network Anaconda Dll Uitableview Cors Xampp Liferay Rust Perforce Angular6 Sip Xmpp List Octave Email Xcode Bluetooth Sql Server 2008 R2 Ibm Midrange Gps Jestjs Google Api Image Jdbc Orientdb Backbone.js Symfony1 Selenium Webdriver Breeze Machine Learning Ag Grid Applescript Javafx Osgi Graphics Db2 Jira F# Jms Spotify Layout Vuejs2 Api Apache Oracle10g Google App Maker Drop Down Menu Unity3d Yaml Build Doxygen Programming Languages Biztalk Xsd Ms Word Rally Jupyter Notebook Kubernetes Antlr Scheme Android Fragments Ravendb Mono R Ibm Cloud Vba Jquery Plugins Phpmyadmin Jekyll Omnet++ Npm Nuget Mapreduce Security Maps Cloud Ldap Transactions Tomcat Class C++ Cli Math Sml Visual Studio 2013 Racket Apache2 Dependency Injection Windows 10 Workflow Jetty Cuda Gwt Extjs Sugarcrm Reactjs Pip Virtualbox Uml Firefox C Windbg Google Colaboratory Open Source Discord.py Geometry Gdb Visual C++ Wordpress Visual Studio 2015 Google Drive Api Twitter Bootstrap Phantomjs Elixir Prestashop Playframework 2.0 Websphere Material Ui Reflection Encoding Openssl Google Chrome Devtools Mariadb Jsf 2 Discord Google Apps Script Jqgrid Jquery Mobile Identityserver4 Charts Cocos2d X Talend Compiler Construction Swift3 Enums Wxpython Ant Fortran Apache Pig Sql Qml Dart Graph Lambda Vhdl Snmp Yii2 Pointers Powerbi Google Calendar Api Gradle Indexing Glsl Inheritance Vim Orm Asp.net Core Mvc Filter Ssl Actions On Google Function Postgresql Servlets Woocommerce Error Handling Collections Groovy Timer Exchange Server Spring Mvc Parallel Processing Routes Xamarin.ios Video Io Aurelia Jpa Eclipse Plugin Makefile Eclipse Rcp Bash Webrtc

Copyright © 2024. All Rights Reserved by - Fatal编程技术网