Web scraping 避免使用Beautifulsoup和urllib.request下载图像_Web Scraping_Beautifulsoup_Urllib_Urlopen - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/visual-studio-2012/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Web scraping 避免使用Beautifulsoup和urllib.request下载图像_Web Scraping_Beautifulsoup_Urllib_Urlopen - Fatal编程技术网

Web scraping 避免使用Beautifulsoup和urllib.request下载图像

web-scraping

Web scraping 避免使用Beautifulsoup和urllib.request下载图像,web-scraping,beautifulsoup,urllib,urlopen,Web Scraping,Beautifulsoup,Urllib,Urlopen,我正在使用urllib.request.urlopen（）（'lxml'解析器）和urllib.request.urlopen（）从网站获取文本信息。然而，当我查看活动监视器中的网络部分时，我看到python下载了大量数据。这意味着不仅要下载文本，还要下载图像在使用BeautifulSoup浏览网页时，是否可以避免下载图像？这不太可能，因为图像不在它们所在的页面上。浏览器或urllib必须多次访问JS、img、CSS等静态文件所在的位置。减少大小的一个可能解决方案是请求压缩内容添加“接受编码

我正在使用

urllib.request.urlopen（）

（

'lxml'

解析器）和

urllib.request.urlopen（）

从网站获取文本信息。然而，当我查看活动监视器中的网络部分时，我看到python下载了大量数据。这意味着不仅要下载文本，还要下载图像

在使用BeautifulSoup浏览网页时，是否可以避免下载图像？

这不太可能，因为图像不在它们所在的

页面上。浏览器或urllib
必须多次访问JS、img、CSS等静态文件所在的位置。减少大小的一个可能解决方案是请求压缩内容
添加“接受编码”：“gzip”
头到请求
对象。如果服务器支持它，那么大小的减少就很好了。然后将gzip.decompress（）
它以获取字符串数据。
？您真的能看到您的响应中写入的原始图像字节码吗？否则我不知道你为什么要下载图片。图像通常单独存储在属性中-HTML scrapers将包含一个指向以文本形式表示的图像的链接，但实际上不会强制下载图像，因为您从未告诉它跟随该链接。我怀疑该页面的数据比您认为的要多。内联JS可以打一拳。我检查了“汤”结果并将其保存到文本文件中。您是对的，它是256KB，并且有
属性链接到实际图像。谢谢你的帮助，阿克沙！




[gstreamer]相关文章推荐



                                                        
为什么gstreamer瓶盖堵塞了管道
gstreamer 
在一个进程中多次加载相同的gstreamer元素
gstreamer 
Gstreamer jhbuild生成：未知的存储库类型
gstreamer 
gstreamer rtpvp8depay无法解码流
gstreamer 
Gstreamer管道语法
gstreamer 
gstreamer管道中mpegtsdemux元素的输出是什么？
gstreamer 
将v4l2loopback和GStreamer与MJPEG摄像头配合使用
gstreamer 
在源代码中获取gstreamer管道对象指针
gstreamer 
                                       





随机文章推荐


                                        

                                        
                                        


                                                
                                                        [web scraping]相关推荐
                                                        
Web scraping 是否存在任何开放的、简单可扩展的网络爬虫？
									Web Scraping
							 									Web Crawler
							 
Web scraping 有没有办法下载所有google Webfont'；所有格式都有吗？
									Web Scraping
							 
Web scraping 高级网络垃圾
									Web Scraping
							 
Web scraping 经典asp中的Web抓取
									Web Scraping
							 									Asp Classic
							 
Web scraping scrapy：将起始url提取到scraping amazon视频信息的问题
									Web Scraping
							 									Scrapy
							 
Web scraping 刮削；“老年人”；带有刮擦、规则和链接提取器的页面
									Web Scraping
							 									Scrapy
							 
Web scraping 刮https://socialblade.com/
									Web Scraping
							 									Scrapy
							 
Web scraping 如何删除受保护的网站？
									Web Scraping
							 
Web scraping 用puppeter优化web抓取
									Web Scraping
							 
Web scraping 在<；a rel=。。。href=>；
									Web Scraping
							 									Xpath
							 
Web scraping 使用beautiful soup时无法获取特定标签
									Web Scraping
							 
Web scraping 将JSON RPC请求解码为协定
									Web Scraping
							 									Ethereum
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Zend Framework
Gdb
Syntax
Drop Down Menu
Xaml
Odoo
Download
Typescript
Mod Rewrite
Jquery Ui
Mongoose
Compiler Construction
Dll
Sharepoint
Visual Studio 2013
Winforms
Log4net
Alfresco
Session
Methods
Facebook
Wicket
Webstorm
Firefox Addon
Requirejs
Common Lisp
Path
Android Ndk
Github
Openssl
Openshift
Entity Framework Core
Swift2
Select
Material Ui
Dynamic
Doxygen
Ember.js
Vue.js
Wix
Mule
Regex
Configuration
Localization
Plsql
Chef Infra
Spring
Xpath
Ssas
Report
Nestjs
Vuejs2
Hash
Timer
Glassfish
Hibernate
Azure Data Factory
Email
Selenium Webdriver
Coq
Content Management System
Webgl
Delphi
Shopify
Akka
Ubuntu
Date
C++11
Hyperledger Fabric
Webrtc
Scala
Sql
Tcp
Iis
Actionscript 3
Orm
Codenameone
Flutter
Sapui5
Printing
Tsql
Passwords
Laravel 5
Devexpress
Search
Filter
Opengl Es
Laravel
Qml
Awk
Telegram
Stripe Payments
Macros
Google Apps Script
Struts2
Visual Studio Code
Jwt
Vbscript
Import
Joomla
Codeigniter
Mdx
Vmware
Excel Formula
Jms
Grafana
Mapreduce
Ibm Cloud
Ssl
Nsis
Asp.net
Terminal
Nunit
Extjs
Visual Studio 2008
Wso2
Jekyll
Nginx
Cocos2d Iphone
Apache Camel
Xamarin.forms
Variables
Arrays
Rss
Mongodb
Asp.net Web Api
Tinymce
Aws Lambda
Angular Material
Google Maps
Binding
Corda
Apache Spark
Google App Engine
Groovy
3d
Glsl
Sockets
Windows Phone
Gtk
Eclipse Rcp
Here Api
Emacs
Couchbase
Qt
Prometheus
Notepad++
Azure
Core Data
Sitecore
Sed
C# 4.0
Umbraco
Xml
Sparql
D
.net 4.0
Reflection
Arangodb
Autocomplete
Socket.io
Next.js
Synchronization
Facebook Graph Api
Ms Access
Amazon Dynamodb
Ibm Mq
Cryptography
Continuous Integration
Microservices
Navigation
Process
Dependency Injection
Mediawiki
If Statement
Windows Phone 8.1
Gulp
System Verilog
Webview
Antlr4
Backbone.js
Vba
Error Handling
Hadoop
Nservicebus
Woocommerce
Youtube Api
Browser
Permissions
Subsonic
Architecture
Verilog
Utf 8
Asp.net Mvc 5
Phpmyadmin
Debugging
Racket
Server
Rspec
Coffeescript
Windows Phone 7


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网