使用Python解析包含&&引用；_Python_Html Parsing - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/drupal/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python解析包含&&引用；_Python_Html Parsing - Fatal编程技术网

使用Python解析包含&&引用；

python

使用Python解析包含&&引用；,python,html-parsing,Python,Html Parsing,我正在使用python库SGMLParser来解析一些html。我遇到了表单的html标记 <td class="school">Texas A&M</td> 在调用之前使用&来替换字符串，而不替换整个字符串中的所有特殊符号（其中一些我可能需要）谢谢如果您从SGMLParser切换到诸如（也处理HTML）之类的现代替代方案，这将变得微不足道： >>> etree.fromstring('''<td class="school

我正在使用python库SGMLParser来解析一些html。我遇到了表单的html标记

<td class="school">Texas A&amp;M</td>

在调用之前使用&来替换字符串，而不替换整个字符串中的所有特殊符号（其中一些我可能需要）

谢谢

如果您从

SGMLParser

切换到诸如（也处理HTML）之类的现代替代方案，这将变得微不足道：

>>> etree.fromstring('''<td class="school">Texas A&amp;M</td>''').text
'Texas A&M'

>etree.fromstring（''Texas A&；M''）.text
“德克萨斯A&M”

实体引用，如

&由处理。检查此方法是否知道如何翻译&。默认实现应该调用handle_data（'&'）
，但您可能意外地覆盖了它
此外，如果可能的话，请考虑使用更高级的方法。
 < P> SGMLParser有<代码> RealtTyTyTyType（<）>代码>方法，但是我不推荐SGMLParser，我建议使用或拥有更好的分析器API。 SGMLParser被贬低，因为没有人关心SGML（大多数人用它来解析HTML，在一个例子中）。XMLParser具有相同的接口，未被弃用。是的，我也不关心SGML，它看起来只是一种从html读取数据的“简单”方法。我会调查lxml的，谢谢。我不认为我写得太多了。。。但是handle_数据会被调用三次，分别是'Texas A'、'&'和'M'对吗？有没有办法将数据连接起来（如果你知道我的意思的话）？看起来每个人都建议使用lxml，所以我会仔细研究它。@mdeland。你必须自己加入数据；SGMLParser是一个非常低级的接口。
>>> etree.fromstring('''<td class="school">Texas A&amp;M</td>''').text
'Texas A&M'




[apache2]相关文章推荐



                                                        
Apache2 Fedora灯：问题包括配置文件
apache2apache 
Apache2 url中三个或多个点后跟斜杠会导致内部服务器错误
apache2 
404使用apache2在cgi bin中执行.py文件时出错
apache2 
Apache2 Munin、apache以及如何对Munin进行密码保护
apache2 
Apache2 对等机重置Apache PHP5-FPM连接
apache2php 
Apache2 将非WWW重定向到WWW
apache2 
Apache2 使用ProxyPass时，我的Apache反向代理未返回正确的URL
apache2 
Apache2 旧的xamp服务器是否不允许json或创建任何问题
apache2phpamazon-ec2xamppjson 
Apache2 Apache 2.4不使用新的DocumentRoot，不提供静态内容
apache2 
Apache2 mod_过滤器用于更改网站上的内容
apache2 
Odoo 8使用反向代理部署网站（apache2）
apache2openerp 
Apache2 HTTP/2推送服务
apache2 
                                       





随机文章推荐


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
                                                        
                                                

                                                
                                                        Tags
                                                        
D3.js
Types
Ssh
Odata
Indexing
Import
Corda
C# 3.0
Oracle Apex
Google Plus
Google Analytics
Dependencies
Laravel 5
Ajax
Discord.py
Apache Flink
Composer Php
Qml
Coq
Sprite Kit
Log4net
Typescript
Google Apps Script
Kubernetes
Kdb
Dictionary
Solr
Orchardcms
Redis
Sockets
Aframe
Azure Cosmosdb
Install4j
Meteor
Acumatica
Amazon Web Services
Web Scraping
Opencv
Reporting Services
Eclipse Rcp
Bash
Eclipse Plugin
Glassfish
Notepad++
Azure Devops
Statistics
Llvm
Encryption
Model
Vim
Netlogo
Tkinter
Core Data
Encoding
Redux
Sql Server
Winapi
Nlp
Plot
Subsonic
Netbeans
Drupal 6
Npm
Error Handling
Memory
Floating Point
Join
Performance
Search
Intellij Idea
Ionic2
Android
Clearcase
Rxjs
Parallel Processing
Android Fragments
Microservices
Node.js
Parse Platform
C#
Printing
Twitter Bootstrap
Zend Framework
Jetty
Project Management
Content Management System
Menu
Blockchain
Google Maps
Highcharts
Checkbox
Xna
Post
Caching
Sails.js
Automation
Biztalk
Apache Pig
Canvas
Apache Spark
Raspberry Pi
Scheme
Sparql
Vb.net
Qt
Compression
Gradle
Z3
Centos
Xmpp
Routes
Isabelle
Video Streaming
Coldfusion
Facebook Graph Api
Hibernate
Compiler Construction
Yocto
Html
Shell
Netsuite
Jquery
Adobe
Cookies
Asp.net
EmptyTag
Keycloak
Dom
Tcp
Ios6
Single Sign On
Windows Phone 8
Gwt
Gis
Ibm Mq
Camera
Cmd
Cocoa Touch
Odoo
Mapping
Gridview
Jms
Ethereum
Laravel
Microsoft Graph Api
Text
Linker
Symfony1
Appium
Polymer
Matlab
Blazor
Selenium Webdriver
Pascal
Uitableview
Frameworks
Email
Ios
Redirect
Replace
Methods
Sencha Touch
Asp.net Mvc 3
Ant
Ansible
Openlayers 3
Sqlalchemy
Logging
File
Actionscript 3
Flash
Javascript
Django Models
Asterisk
Mfc
Django Rest Framework
Dependency Injection
Nest
Com
Editor
Concurrency
Google Compute Engine
Windows Mobile
Machine Learning
Curl
Java
Ms Access
Symfony
Requirejs
Linux Kernel
Backbone.js
Electron
Gstreamer
Cluster Computing
Msbuild
Sql Server 2008 R2
Cron
Variables
Spotify
Google Chrome
React Native


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网