如何使用BeautifulSoup（Python）排除元素_Python_Beautifulsoup - Fatal编程技术网

如何使用BeautifulSoup（Python）排除元素

python

如何使用BeautifulSoup（Python）排除元素,python,beautifulsoup,Python,Beautifulsoup,我试图从本文（）中提取文章文本，并排除底部的合法容器。文本部分似乎很简单，但似乎无法摆脱容器。为了便于使用，我将其与法律变量分开以下是我目前的代码： import requests from bs4 import BeautifulSoup base_url = 'https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture' r = requests.get(base_url) r_

我试图从本文（）中提取文章文本，并排除底部的合法容器。文本部分似乎很简单，但似乎无法摆脱容器。为了便于使用，我将其与法律变量分开

以下是我目前的代码：

import requests
from bs4 import BeautifulSoup

base_url = 'https://www.vanityfair.com/style/society/2014/06/monica-lewinsky-humiliation-culture'
r = requests.get(base_url)
r_html = r.text
soup = BeautifulSoup(r_html)

legal = soup.find('div',{'class': 'legal-container'})

paragraphs = soup.find_all('p')

for text in paragraphs:
    print text.get_text()

我应该怎么做呢？

总是找到你想要的部分，看看如何单独提取该部分，而不是获取所有文本，然后删除不需要的部分

在您的情况下，您可能需要的文本被分组在

部分的标签中，该标签位于div
中，该div具有class
属性content-drop-cap
。您可以通过以下方式获得此信息：
content_div = soup.find('div', {'class': 'content drop-cap'})

这样，您就可以灵活地按节对文本进行分组：
sections = content_div.findAll('section')

但是，如果您仍然坚持获取所有段落并明确排除合法容器，则可以从soup
对象中删除合法容器
发件人：
分解（）
decompose（）从树中删除标记，然后完全销毁
它及其内容
如果选择执行此操作，请在提取文本之前删除不需要的标记：
soup.find('div', {'class': 'legal-container'}).decompose()

除了排除之外，你不能定义一个比所有p
标记更好的选择吗？我试过了，但是似乎找不到一个好方法。所有文本，包括合法容器中的文本，都有选择器div.article-main p
？文章中的所有段落谢谢！那太有用了！我还是个新手：）




[google drive api]相关文章推荐



                                                        
Google drive api 新的缩略图功能-如何在Java中使用
google-drive-api 
Google drive api “之间的不一致行为”；“真实用户”；及空气污染指数
google-drive-api 
Google drive api google驱动器共享文件对话框的要求
google-drive-api 
Google drive api 使用OAuth 2.0客户端ID限制Google项目中的帐户
google-drive-api 
Google drive api Google API PHP客户端出现问题，运行快速启动脚本时出错
google-drive-api 
Google drive api 计算google drive enterprise中的文件数
google-drive-api 
Google drive api 使用Google Drive REST API创建文件访问请求
google-drive-api 
Google drive api 服务器关闭时对应用程序文件的替代访问
google-drive-apireport 
Google drive api 注释在相应的Google文档中的位置
google-drive-api 
Google drive api 如何使用Google Drive获取请求文件/文件夹权限的用户列表
google-drive-api 
Google drive api 在JupyterLab中从Google Drive读取文件
google-drive-api 
                                       





随机文章推荐



                                                        
Opengl es 使用OpenGL ES的glTexCoordPointer问题
opengl-es 
Opengl es Opengl ES-绘制多个顶点的平面
opengl-es 
Opengl es 如何在openGL中旋转特定对象？
opengl-esgraphics 
Opengl es 从'；黄金'；预订或从ImgTec在您的64位Win 7机器上工作？
opengl-es 
Opengl es opengles中的高效绘图方法
opengl-eswebgl 
Opengl es 修改gl_FragColor.a是否直接与雾旋合？
opengl-esunity3dglsl 
Opengl es 什么时候我应该停止使GLSL-ES着色器复杂化，并开始将它们拆分为更小、更简单的着色器？
opengl-esglsl 
Opengl es 将Adobe Illustrator矢量图像转换为Open GL
opengl-es 
Opengl es 非矩形纹理的纹理映射-如何获得特定纹理的颜色
opengl-esglsl 
Opengl es 如何在WebGL中访问相邻像素的深度值？
opengl-eswebgl 
Opengl es 支持Opengl es 2.0中的纹理格式GL_RGB_422_APPLE（每像素8位）？
opengl-es 
Opengl es WebGL-设置属性/制服的位置示例
opengl-eswebgl 
Opengl es glEnable（GL_深度_测试）不工作？
opengl-es 
Opengl es 如何使片段着色器用alpha替换白色，opengl es
opengl-es 
Opengl es OpenGL ES 2.0坐标系
opengl-es 
Opengl es 正在加载PNG图像，但在WebGL中将其用作压缩的\u RGBA\u S3TC\u DXT5\u EXT？
opengl-eswebgl 
Opengl es 使用颜色但不使用纹理淡出OpenGL VBO
opengl-es 
Opengl es OpenGL ES中基于帧缓冲区纹理的多通道渲染
opengl-es 
Opengl es 在Ubuntu上编译用于ARM的OpenGL ES 2.0
opengl-escompilationarm 
Opengl es OpenGL和场景工具包之间的通用导出格式
opengl-es


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
Python 将用户插入Active Directory
									Python
							 									Sql Server
							 									Active Directory
							 									Ldap
							 
Python 通过索引计算数据帧上的连续事件
									Python
							 									Indexing
							 									Pandas
							 
Python 如何减少函数定义中变量的数量
									Python
							 									Function
							 
Python Tkinter似乎在打开第二个窗口时模糊了我的图标
									Python
							 									Tkinter
							 
Python 找不到openssl库
									Python
							 									Mysql
							 									Openssl
							 									Installation
							 
python UnicodeEncodeError在启动时出错，无法运行任何脚本
									Python
							 									Python 2.7
							 
从PNG或JPG创建一个使用python在XP上工作的ICO
									Python
							 									Python 2.7
							 
如何提高python导入速度？
									Python
							 									Performance
							 
使用Python自动获取分页instagram
									Python
							 									Instagram
							 
Python 2类tkinter应用程序中的TypeError
									Python
							 									Tkinter
							 
Python循环进出def不会'；我不会有同样的结果
									Python
							 									Pandas
							 
Python+；ffmpeg TypeError:在字符串格式化过程中未转换所有参数
									Python
							 									Ffmpeg
							 
Python 尝试在根目录中导入脚本时出错
									Python
							 
Python 插入numpy数组的动态/参数化维度
									Python
							 									Numpy
							 									Matrix
							 
Python Django与MongoDB can'；没有django管理面板？
									Python
							 									Django
							 									Mongodb
							 
Python 确定连续行之间唯一值的数量
									Python
							 									Pandas
							 									Dataframe
							 
python-在N未知时，按1个绘图数绘制N
									Python
							 
Python 从两个列表中添加值时的效率
									Python
							 									Algorithm
							 
Python 熊猫-如何将数据帧的转置作为列标题附加到另一个数据帧？
									Python
							 									Pandas
							 									Dataframe
							 
Python 为什么在使用strtime时会出现这个datetime日期错误
									Python
							 									Python 3.x
							 									Python 2.7
							 									Odoo
							 
Python 从io.BytesIO流加载numpy.load
									Python
							 									Numpy
							 
Python 如何将列表变量指定给两类变量？
									Python
							 									List
							 
解决以下python问题中的零除法错误
									Python
							 									Python 3.x
							 
在python中如何从数组列表值检查考勤？
									Python
							 
Python 从列表中删除所有大于所需数字的值
L=[10,19,20,30,8,11,9]
i=0
当i==0时：
当L[i]
									Python
							 									Python 3.x
							 
Python 类型错误：'；模块'；对象不可调用-导入错误？
									Python
							 
Python 如何检查一个数字是否可以表示为x乘以y的幂？
									Python
							 									Python 3.x
							 
Python 如何修复缩进错误
									Python
							 
Python 从字母数字字符串截断
									Python
							 									Regex
							 
Python 如何在Matplotlib中打印带有标题的图像？
									Python
							 									Matplotlib
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Actions On Google
Caching
Openid
Combobox
Html
Sharepoint 2007
Android Layout
Ember.js
Crystal Reports
C#
Webpack
Python
Applescript
Configuration
Windows Installer
Pascal
Laravel
Mapping
Visual Studio 2013
Boost
D3.js
Symfony1
Socket.io
Memory
Graph
Python 3.x
Telegram
Error Handling
Scala
Login
Android Fragments
Automation
Activemq
Telerik
Oracle
Kdb
Passwords
Camera
Wso2
Azure Cosmosdb
Hyperledger Fabric
Graphviz
Workflow
Mpi
Path
Image
Google Chrome
Tensorflow
Interface
Download
Chart.js
Binary
Variables
Lambda
Apache Pig
Microservices
Documentation
Devexpress
Menu
Mobile
Vhdl
Erlang
Replace
Playframework 2.0
Binding
Oracle10g
Tomcat
Reporting Services
Ios4
Timer
Data Structures
Xampp
Entity Framework Core
Jekyll
Sequelize.js
Drupal 6
Frameworks
Grid
.net 4.0
Zsh
Exchange Server
Testng
Windbg
Linq
Firebase
Postgresql
Pytorch
Glassfish
Openstack
Mono
Iframe
Ubuntu
Odata
Antlr4
Exception Handling
User Interface
E Commerce
Webgl
Optimization
Linux
Dotnetnuke
Post
Regex
Oauth 2.0
Swagger
C
Printing
Vector
Routes
Spring Boot
Installation
Racket
Speech Recognition
Google Chrome Extension
Module
Chef Infra
Autodesk Forge
Hazelcast
Orientdb
Raspberry Pi
Triggers
Openshift
Codenameone
Ssrs 2008
Apache Storm
Vmware
Signalr
Cuda
Verilog
Windows Phone 8
Umbraco
Discord
Gruntjs
Ibm Cloud
Apache Zookeeper
Types
Encoding
Arduino
Lotus Notes
Notepad++
Svg
Solr
Twig
Zend Framework
Haskell
Nlp
Join
Gis
Wix
Cron
Elixir
Cygwin
Gstreamer
Ios
Razor
Cakephp
Jira
Kernel
Google Bigquery
Netbeans
Jersey
Perl
Mapreduce
Push Notification
Knockout.js
Select
Gatsby
Qml
Routing
Gremlin
Actionscript 3
Octave
Notifications
Stata
Netlogo
Sass
Windows Mobile
Azure Functions
Jar
Uiview
Serialization
Angularjs
Vaadin
Vue.js
Windows 8
Dependency Injection
Iis 7
Dynamic
Cmake
Jquery Ui
Asp.net Mvc 3
Compiler Construction
Amazon Dynamodb
Winforms
Omnet++
Virtual Machine
Jhipster
Recursion
Redux
Itext
Animation


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网