Python 使用urllib2进行Web抓取_Python_Python 2.7_Rss_Urllib2_Urllib - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用urllib2进行Web抓取_Python_Python 2.7_Rss_Urllib2_Urllib - Fatal编程技术网

Python 使用urllib2进行Web抓取

python python-2.7 rss

Python 使用urllib2进行Web抓取,python,python-2.7,rss,urllib2,urllib,Python,Python 2.7,Rss,Urllib2,Urllib,我正试图从这个RSS源中删除所有标题：这是我的相同代码： import urllib2 import re content = urllib2.urlopen('http://www.quora.com/Python-programming-language-1/rss').read() allTitles = re.compile('<title>(.*)</title>') list = re.findall(allTitles,content) for e i

我正试图从这个RSS源中删除所有标题：

这是我的相同代码：

import urllib2
import re
content = urllib2.urlopen('http://www.quora.com/Python-programming-language-1/rss').read()
allTitles =  re.compile('<title>(.*)</title>')
list = re.findall(allTitles,content)
for e in range(0, 2):
    print list[e]

导入urllib2
进口稀土
content=urlib2.urlopen（'http://www.quora.com/Python-programming-language-1/rss）。读（）
allTitles=re.compile（“（.*”）
list=re.findall（所有标题、内容）
对于范围（0,2）内的e：
打印列表[e]

然而，我没有得到一个标题列表作为输出，而是从rss源代码中得到了一堆代码。我做错了什么？

应该在表达式中使用非贪婪标记（？）：

#allTitles =  re.compile('<title>(.*)</title>')
allTitles =  re.compile('<title>(.*?)</title>')

#allTitles=re.compile（'（.*））
allTitles=re.compile（“（.*？”）

如果没有（.*）组中除最后一个

之外的所有文本…
如前所述，您的代码缺少用于regexp的贪婪说明符，可以使用它进行修复。但我强烈建议从正则表达式切换到更适合xml解析的工具，如，或专门的rss解析模块，如
例如，查看如何使用lxml完成任务：
>>> import lxml.etree
>>> rss = lxml.etree.fromstring(content)
>>> titles = rss.findall('.//title')
>>> print '\n'.join(title.text for title in titles[:2])
Questions About Python (programming language) on Quora
Could someone explain for me the following Python function that uses @wraps from functools?

如果我在代码中添加了非贪婪标记，我只会从该链接中提取前两个标题。我如何提取所有嵌入在下面的文本？哦，是的！我的错。感谢您如此准确、迅速的回答：）




[python 2.7]相关文章推荐



                                                        
                                       





随机文章推荐



                                                        
如何使用QBXML从Quickbooks获取销售订单中的列名
quickbooks 
Quickbooks Online API资源管理器仅返回部分购买
quickbooks 
Quickbooks 添加发票时出错
quickbooks


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
                                                        
                                                

                                                
                                                        Tags
                                                        
Windows Services
Internet Explorer 8
Express
Oop
Quickbooks
Django Rest Framework
Directx
Z3
Concurrency
Resharper
Plone
System Verilog
Post
Sql Server
Scikit Learn
Asp.net Web Api
Eclipse
Mfc
Sorting
Safari
Responsive Design
F#
Centos
Julia
Uiview
Racket
Zend Framework
Llvm
Sap
Eclipse Rcp
Liferay
Content Management System
Localization
Elm
Plugins
Twitter Bootstrap
Maps
Jhipster
Canvas
Xcode
Azure Service Fabric
Video Streaming
Jsf 2
Vector
Matplotlib
Search
Drupal 6
Jasper Reports
Filesystems
Mariadb
Osgi
Mapbox
Listview
Redux
Xna
Fluent Nhibernate
Groovy
Webview
Image
Shell
Linker
Google Maps
Php
Mongodb
Iframe
Highcharts
Hbase
Meteor
Openssl
Lua
Jsf
Joomla
Tkinter
Tinymce
Cygwin
Bazel
Influxdb
Mysql
Merge
Ruby On Rails 3
Colors
Linq To Sql
Excel Formula
Xpages
Sip
Couchbase
Bootstrap 4
Windows Installer
Entity Framework
Google Visualization
Ionic Framework
Swagger
C++11
Ecmascript 6
Printing
Leaflet
For Loop
Parameters
Automation
Menu
Umbraco
Axapta
Visual Studio 2010
Logstash
Django
Virtual Machine
Combobox
Rabbitmq
Ios5
Knockout.js
Clearcase
Microsoft Graph Api
React Native
Akka
Windows Phone
Windows Phone 8.1
Openlayers 3
Apache2
Xaml
Autocomplete
Electron
Doctrine Orm
Node.js
Powerbi
Aws Lambda
Struct
Webgl
Datetime
Tfs
Workflow
Xsd
Erlang
Powershell
Mobile
Google Chrome
Less
Migration
Github
Paypal
Cassandra
Kibana
Neo4j
Openerp
Chef Infra
Exchange Server
Pine Script
Postgresql
Intellij Idea
Ftp
Cloud Foundry
Login
Facebook
Hyperlink
Cocos2d X
Orchardcms
Flutter
Charts
Zurb Foundation
Stripe Payments
Jquery
Video
Stata
EmptyTag
Network Programming
Amazon Cloudformation
Prolog
Wix
Function
Phantomjs
Notifications
Regex
Windbg
Nsis
Stm32
Antlr4
Heroku
Ios6
Db2
Boost
Chart.js
Applescript
Xamarin
Unix
Nlp
Plsql
Azure Data Factory
Perforce
Graph
Cypress
Spring Security
Jetty
Dask
Interface
C++ Cli
Iphone
Model View Controller
Asp.net Mvc 5
Pentaho
Filter
Sql Server 2012
Gulp


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网