Python//Regex//标记_Python_Html_Regex_Bs4 - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python//Regex//标记_Python_Html_Regex_Bs4 - Fatal编程技术网

Python//Regex//标记

python html regex

Python//Regex//标记,python,html,regex,bs4,Python,Html,Regex,Bs4,我试图从中间提取一些文本 </br></td>, <td class="first">TEXT_1a<br>TEXT_1b </br></td>, <td class="first">TEXT_2a<br>TEXT_2b </br></td>, <

我试图从中间提取一些文本

</br></td>, <td class="first">TEXT_1a<br>TEXT_1b
                                </br></td>, <td class="first">TEXT_2a<br>TEXT_2b
                                </br></td>, <td class="first">TEXT_3a<br>TEXT_3b
                                </br></td>, <td class="first">TEXT_4a<br>TEXT_4b
                                </br></td>, <td class="first">TEXT_5a<br>TEXT_5b
                                </br></td>, <td class="first">TEXT_6a<br>TEXT_6b


，文本1a
文本1b

，文本2a
文本2b

，文本3a
文本3b

，文本4a
文本4b

，文本5a
文本5b

，TEXT_6a
TEXT_6b

我使用了BeautifulSoup（BS4）

text=first\u td.renderContents（）
trimmed_text=text.strip（）
打印修剪过的文本

提取文本。但是，我只获取

之后的第一个文本，以提取td
标记中的所有文本
>>> s = '''<td class="first">TEXT_1a<br>TEXT_1b
                                </br></td>, <td class="first">TEXT_2a<br>TEXT_2b
                                </br></td>, <td class="first">TEXT_3a<br>TEXT_3b
                                </br></td>, <td class="first">TEXT_4a<br>TEXT_4b
                                </br></td>, <td class="first">TEXT_5a<br>TEXT_5b
                                </br></td>, <td class="first">TEXT_6a<br>TEXT_6b'''
>>> soup = BeautifulSoup(s)
>>> [i.text.strip() for i in soup.select('td.first')]
['TEXT_1aTEXT_1b', 'TEXT_2aTEXT_2b', 'TEXT_3aTEXT_3b', 'TEXT_4aTEXT_4b', 'TEXT_5aTEXT_5b', 'TEXT_6aTEXT_6b']

>>s=''文本1a
文本1b

，文本2a
文本2b

，文本3a
文本3b

，文本4a
文本4b

，文本5a
文本5b

，TEXT_6a
TEXT_6b''
>>>汤=美汤（s）
>>>[i.text.strip（）表示汤中的i.select（'td.first'）]
['TEXT_1a TEXT_1b'、'TEXT_2a TEXT_2b'、'TEXT_3a TEXT_3b'、'TEXT_4aTEXT_4b'、'TEXT_5aTEXT_5b'、'TEXT_6a TEXT_6b']
嘿，阿维纳什。这很有效，谢谢你。是否可以将文本_1b等放在另一个数组中？请尝试[re.findall（r'（？）[^>好的。这似乎会在文本之间留出很大的间隙。是否可以只删除末尾的文本（文本_2b、文本_3b…等）？谢谢！




[html]相关文章推荐



                                                        
Html iFrame显示为白色框[在Chrome和Firefox中测试]
失败消息
htmlcssfirefoxgoogle-chromeiframe 
Html 需要帮助获取div以填充整个文本和屏幕区域吗
html 
Html 背景附件：修复了无法在chrome中工作的问题
htmlcssgoogle-chrome 
Html 标记验证表明div未关闭，但它们已关闭
htmlvalidation 
Html 调整大小<；thead>；和滚动<；t车身>；保持静止高度
htmlcss 
我有这个HTML，需要帮助编写XpathQueryString
htmlxsltxpath 
Html 元素渲染和间距不一致
htmlcssgoogle-chrome 
Html 用shell解析txt文件
htmlshellparsing 
Html 如何使用css将一个div边框隐藏在另一个div后面？
htmlcss 
带浮动的HTML布局
htmlcsslayout 
WinJs或Cordova本地html导航
htmlcordovapermissionsnavigation 
Html z索引不适用于固定位置元素
htmlcss 
在html/css中的并发背景色中启动新背景色
htmlcss 
Html 使遮罩背景中的列表框可见
htmlcss 
Html 将鼠标悬停在同级上，使下拉列表元素可见
htmlcssdrop-down-menu 
Html 如果不使用浮动，如何使第二个内联块不从上一个内联块的底部开始？
htmlcss 
Html 不使用骨架css的行/列
htmlcss 
Html 如何在topbar中为您的网站制作图标
htmlcssimage 
Html 什么是>；*在CSS中选择？
htmlcss 
Html 我描述的文本没有响应，但键入的文本做得很好
htmlcsslaravelresponsive-design 
                                       





随机文章推荐



                                                        
Tensorflow 如何在图形复制培训中共享tf.变量？
tensorflow 
windows 7 pc上的Tensorflow流量测试错误
tensorflow 
如何检查keras tensorflow后端是GPU还是CPU版本？
tensorflowkeras 
Tensorflow 卷积神经网络训练
tensorflowmachine-learningneural-network 
Tensorflow 节点'；输出'；在型号'中不存在；file:///android_asset/myoutput_graph.pb'
tensorflow 
Tensorflow Google Cloud ML Engine trainer是否必须明确了解Google云存储URI？
tensorflow 
Tensorflow 无法使用LSTM为机器翻译生成正确的英语到SQL翻译
tensorflowmachine-learningnlpdeep-learning 
Tensorflow 基于过去和未来值的每个时间序列步骤的二进制分类
tensorflowmachine-learningkeras 
Tensorflow 如何让toco使用shape=[None，24，24，3]
tensorflow 
Tensorflow 如何修复tf.constant意外参数错误
tensorflow 
前馈神经网络tensorflow.js中的最小化损失
tensorflowmachine-learning 
Tensorflow Can'；t保存/导出并加载使用即时执行的keras模型
tensorflowkeras 
Tensorflow 需要帮助了解tf.contrib.layers.Full_连接训练权重吗
tensorflow 
Tensorflow Keras替代ImageDataGenerator加载任意numpy张量
tensorflowkeras 
Tensorflow GradientTape在fit和定制训练功能中的GPU利用率非常不同
tensorflowkeras 
Tensorflow 神经网络，为什么我们在进行批量梯度下降时沿轴0（np.sum（…，轴=0））求和偏差？
tensorflowmachine-learningkerasneural-networkpytorch 
Tensorflow值错误：登录和标签必须具有相同的形状（（无，42）与（无，1））
tensorflowmachine-learningkeras 
我无法理解tensorflow 1中的LSTM实现
tensorflowmachine-learningdeep-learning 
使用tensorflow计算多类精度和召回率的问题
tensorflow 
tensorflow/core/framework/cpu_allocator_impl.cc:80]15414067200的分配超过google colab上可用系统内存的10%
tensorflowgoogle-colaboratory


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
                                                        
                                                

                                                
                                                        Tags
                                                        
Frameworks
Iis
Computer Vision
Acumatica
Kubernetes
Botframework
Ruby On Rails 3.1
Asp.net
Ant
Workflow
Graph
Angular
Ocaml
Compression
Cmake
Pip
Codenameone
Dojo
Ftp
Twig
Big O
Vagrant
Ember.js
Parallel Processing
Hadoop
Templates
Android
Snowflake Cloud Data Platform
Google Chrome
Logic
Mqtt
Heroku
Tcp
Reporting Services
Powerbi
Primefaces
Office Js
Mediawiki
Google Cloud Storage
Installation
Office365
Inno Setup
Mips
Windows 7
Umbraco
Amazon Cloudformation
Zend Framework
Ibm Midrange
Memory
Aframe
Command Line
Azure Functions
Html5 Canvas
Sed
Here Api
Network Programming
Mapping
Caching
Neural Network
Playframework
Windows 8
Printing
Selenium Webdriver
Apache2
Visual Studio 2010
Winforms
Amazon Redshift
Actionscript
Google Cloud Platform
Log4j
Configuration
Doxygen
Instagram
Proxy
Process
Vuejs2
Sharepoint
Imagemagick
Linux
Soap
Combobox
Osgi
Oracle11g
Google Apps Script
Certificate
Cocoa Touch
Scripting
Apache Kafka
Firebase
Image Processing
Jqgrid
Ibm Mq
Orchardcms
Memory Management
Google Maps Api 3
Fiware
Appium
Sml
Kdb
Cloud
Google Cloud Dataflow
Salesforce
Menu
Hybris
Reflection
Drupal 7
Dns
Jms
Vhdl
Tableau Api
Google Api
Internationalization
Ios
Solr
.htaccess
Facebook
Sql Server 2012
Ssl
Automated Tests
Ms Office
Windows Phone 7
Sql Server
Quickbooks
Xamarin
Character Encoding
Streaming
Checkbox
Arm
Gradle
Com
Codeigniter
Nlp
Youtube
Google Plus
Asynchronous
Directory
Testng
Asp.net Web Api
C# 3.0
Iis 7
Geometry
Sdk
Terminal
Interface
Gps
Abap
Antlr4
Swagger
Rally
Swing
System Verilog
Apache Flink
Entity Framework Core
Merge
Sonarqube
Emacs
Java Me
Netlogo
Asp.net Core Mvc
Drools
Crystal Reports
Rss
Python 2.7
Vue.js
Wpf
Node.js
Open Source
Matplotlib
Signalr
Subsonic
Google Colaboratory
Filesystems
Gwt
Redirect
Java
Discord.js
Azure Active Directory
Zend Framework2
Silverlight
Jquery
Deployment
Ruby On Rails 3.2
Asp Classic
Stored Procedures
Selenium
Safari
Visual Studio 2008
Post
Android Ndk
Chef Infra
Web
Dart
Openlayers 3
Leaflet
Video Streaming
Listview
Material Ui
Google Calendar Api
Docusignapi
Single Sign On
Lucene


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网