Python：如何提取嵌入在html文件中的xml？_Python_Xml_Screen Scraping - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python：如何提取嵌入在html文件中的xml？_Python_Xml_Screen Scraping - Fatal编程技术网

Python：如何提取嵌入在html文件中的xml？

python xml

Python：如何提取嵌入在html文件中的xml？,python,xml,screen-scraping,Python,Xml,Screen Scraping,我有一个嵌入xml的html文件，源代码粘贴在pastbin中： <html> <head> <title> test֤</title> </head> <body> <form name="acsForm" action="" method="post" > <textarea rows=10 cols=80 name="xmlText"><?xml v

我有一个嵌入xml的html文件，源代码粘贴在pastbin中：

<html>
  <head>
    <title> test֤</title>
  </head>
  <body>
    <form name="acsForm" action="" method="post" >
      <textarea rows=10 cols=80 name="xmlText"><?xml version="1.0" encoding="UTF-8"?>
        <samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol">
        </samlp:Response> 
      </textarea>
      <textarea name="2nd"> text2....</textarea>             
    </form>
  </body>
</html>


测试֤
文本2。。。。

我的任务是从HTML中提取第一个

textarea

中包含的文本，这是一个XML片段。没有对原始代码段进行任何更改。我可以通过使用BeautifulSoup获得它，但它将所有标记名改为小写

尝试使用BeautifulSoup库的一部分，它是为XML设计的。

也许可以，尽管我自己从未使用过它，所以我不知道做你想做的事情有多容易/复杂。

（啊！为什么这么多作者似乎认为

内容不需要HTML转义？傻瓜！）

不幸的是，BeautifulSoup 3.1没有应用（不正确但常见的）处理

的浏览器修复。我刚刚尝试了BeautifulSoup 3.0，但它不适用于我：
xml ='<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"></samlp:Resonse>'
print BeautifulSoup.BeautifulStoneSoup(xml)
<samlp:response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"

xml=''
打印BeautifulSoup.BeautifulStoneSoup（xml）
最后我发现pyparsing是完成任务的最佳武器：
aStart，aEnd=makeHTMLTags（“textarea”）
搜索=aStart+SkipTo（aEnd）（“正文”）+aEnd
saml_resp_str=search.searchString（doc）[0]。正文
relay\u state\u str=search.searchString（doc）[1]。正文




[xml]相关文章推荐



                                                        
Xml 基于元素属性的样式？
xmlxslt 
Xml Flex e4x-按多个属性选择
xmlapache-flexflashactionscript-3 
Xml SVG浏览器问题
xmlsvg 
使用XML词汇表编写验证规则
xmlxsd 
Xml 带属性和文本的simplecontent的XSD
xmlvalidationxsd 
通过xslt转换转换为output.XML的XML文件
xmlxsltxpath 
Xml XSLT和临时文档
xmlxslt 
如何在SOAPUIPro中使用请求XML验证响应XML
xmlgroovy 
如何从XML模式文件中提取数据类型？
xmltypesxsd 
使用Xpath从当前XML文档中的链接文件获取内容
xmlxpath 
Xml 什么是'@评论（）'；你喜欢XPath吗？
xmlxpath 
Xml 同时编辑开始和结束标记
xmleclipse 
如何使用Vim跳转到XML元素的下一个同级？
xmlvim 
使用Microsoft dynamics CRM以XML格式执行查询
xmldynamics-crm 
如何使用PerlXML:：LibXML使用findnodes和findvalue解析XML数据
xmlperl 
Xml Xquery，计算子体中出现的值
xmlxquery 
Scala Play更新XML请求
xmlscalaplayframework 
如何选择Application/XML或Text/XML作为媒体类型？
xml 
Xml 仅用不同的名称替换根元素-XSLT
xmlxslt 
Xml 如何访问if语句XSLT中的嵌套元素
xmlif-statementdomxslt 
                                       





随机文章推荐



                                                        
Api 详细的内部文件
apidocumentationd 
restfulapi：如何建模&x27；请求新密码'；？
apirest 
Api Magento 1.6.1：什么是集装箱？
apimagentoe-commerce 
Haskell GHC API错误中的动态编译
apihaskellcompiler-construction 
有没有可能在一个有Foursquare API的场所获得最近的签到？
api 
使用社交登录访问私有API
apioauth 
QuickBooks API-仅检索已更改的数据
apiquickbooks 
如何使用访问令牌进行twitter api调用
apitwitter 
如何在RESTful API中实现两级身份验证？
apirest 
启用adsense管理api时出错
api 
设计Restful API的技巧-这种身份验证逻辑合理吗？
apirest 
SharePoint REST API getFolderByServerRelativeUrl不返回任何内容
apirestsharepoint 
Api 从Acumatica检索TOP Count
apiacumatica 
Api 按类别显示所有产品
apiwoocommerce 
我正在尝试使用microsoft认知搜索api获取web结果
final String accountKey=“*************************”；
最终字符串模式=”https://api.cognitive.microsoft.com/bing/v5.0/search?q=bill 盖茨”；
String query=URLEncoder.encode（“'what is omonoia'”，Charset.defaultCharset（）.name（））；
String bingU
api 
如何在ionic中按时间间隔调用api
apiionic-frameworkionic2 
Twitter API 1.1集合显示403/220
apitwittercollections 
如何使用binance api获取帐户信息？
api 
Api 在数据库中找不到数据时的Http状态代码
api 
API贝宝我可以使用帐户标准吗？
apipaypal


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
                                                        
                                                

                                                
                                                        Tags
                                                        
Frameworks
Iis
Computer Vision
Acumatica
Kubernetes
Botframework
Ruby On Rails 3.1
Asp.net
Ant
Workflow
Graph
Angular
Ocaml
Compression
Cmake
Pip
Codenameone
Dojo
Ftp
Twig
Big O
Vagrant
Ember.js
Parallel Processing
Hadoop
Templates
Android
Snowflake Cloud Data Platform
Google Chrome
Logic
Mqtt
Heroku
Tcp
Reporting Services
Powerbi
Primefaces
Office Js
Mediawiki
Google Cloud Storage
Installation
Office365
Inno Setup
Mips
Windows 7
Umbraco
Amazon Cloudformation
Zend Framework
Ibm Midrange
Memory
Aframe
Command Line
Azure Functions
Html5 Canvas
Sed
Here Api
Network Programming
Mapping
Caching
Neural Network
Playframework
Windows 8
Printing
Selenium Webdriver
Apache2
Visual Studio 2010
Winforms
Amazon Redshift
Actionscript
Google Cloud Platform
Log4j
Configuration
Doxygen
Instagram
Proxy
Process
Vuejs2
Sharepoint
Imagemagick
Linux
Soap
Combobox
Osgi
Oracle11g
Google Apps Script
Certificate
Cocoa Touch
Scripting
Apache Kafka
Firebase
Image Processing
Jqgrid
Ibm Mq
Orchardcms
Memory Management
Google Maps Api 3
Fiware
Appium
Sml
Kdb
Cloud
Google Cloud Dataflow
Salesforce
Menu
Hybris
Reflection
Drupal 7
Dns
Jms
Vhdl
Tableau Api
Google Api
Internationalization
Ios
Solr
.htaccess
Facebook
Sql Server 2012
Ssl
Automated Tests
Ms Office
Windows Phone 7
Sql Server
Quickbooks
Xamarin
Character Encoding
Streaming
Checkbox
Arm
Gradle
Com
Codeigniter
Nlp
Youtube
Google Plus
Asynchronous
Directory
Testng
Asp.net Web Api
C# 3.0
Iis 7
Geometry
Sdk
Terminal
Interface
Gps
Abap
Antlr4
Swagger
Rally
Swing
System Verilog
Apache Flink
Entity Framework Core
Merge
Sonarqube
Emacs
Java Me
Netlogo
Asp.net Core Mvc
Drools
Crystal Reports
Rss
Python 2.7
Vue.js
Wpf
Node.js
Open Source
Matplotlib
Signalr
Subsonic
Google Colaboratory
Filesystems
Gwt
Redirect
Java
Discord.js
Azure Active Directory
Zend Framework2
Silverlight
Jquery
Deployment
Ruby On Rails 3.2
Asp Classic
Stored Procedures
Selenium
Safari
Visual Studio 2008
Post
Android Ndk
Chef Infra
Web
Dart
Openlayers 3
Leaflet
Video Streaming
Listview
Material Ui
Google Calendar Api
Docusignapi
Single Sign On
Lucene


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网