python用户-从从从docx转换的html文档中检索脚注（docx2python不工作）_Python_Html_Regex_Xml_Docx - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python用户-从从从docx转换的html文档中检索脚注（docx2python不工作）_Python_Html_Regex_Xml_Docx - Fatal编程技术网

python用户-从从从docx转换的html文档中检索脚注（docx2python不工作）

python html regex xml

python用户-从从从docx转换的html文档中检索脚注（docx2python不工作）,python,html,regex,xml,docx,Python,Html,Regex,Xml,Docx,我需要一些帮助从python中的docx文档检索脚注，因为docx文件包含大量脚注下面是我目前遇到问题的代码，因为docx2python无法读取超过一定数量的页面的word文档 from docx2python import docx2python docx_temp = docx2python(filepath) footnotes = docx_temp.footnotes footnotes = footnotes[0][0][0] footnotes = [i.replace("\

我需要一些帮助从python中的docx文档检索脚注，因为docx文件包含大量脚注

下面是我目前遇到问题的代码，因为docx2python无法读取超过一定数量的页面的word文档

from docx2python import docx2python docx_temp = docx2python(filepath) footnotes = docx_temp.footnotes footnotes = footnotes[0][0][0] footnotes = [i.replace("\t","") for i in footnotes]
因此，我尝试了下面的其他方法，但由于不熟悉XML，我无法确定代码是否正常工作，因此我陷入了困境：

import re import mammoth with open(filepath, 'rb') as file: html = mammoth.convert_to_html(file).value #html = re.sub('\"(.+?)\"', '"<em>\1</em>"', html) fnotes = re.findall('id="footnote-<number>" (.*?) ', html)
你们能告诉我如何正确地编写代码从docx/HTML文件中提取脚注吗。谢谢你的帮助

import re import zipfile import xml.etree.ElementTree from docx2python import docx2python docxfile = zipfile.ZipFile(open(filepath,'rb')) xmlString = docxfile.read('word/footnotes.xml').decode('utf-8') fn = docxfile.read('word/footnotes.xml') xml.etree.ElementTree.parse(fn)

[html]相关文章推荐

随机文章推荐

Postman 邮递员：是否可以自定义集合运行程序中的测试运行顺序 postman

Postman 使用Newman下载响应文件 postman

Postman 邮递员-如何发送大型JSON正文？ postman

Postman rest客户端每次都给出404错误作为响应 postman

postman如何在导入大量端点后设置环境url变量 postman

Postman 如何通过邮递员进行远程过程调用？ postman

Postman 如何在邮递员请求正文中插入全局/环境变量？ postman

Postman 当身份验证由keydove管理时，如何访问JHipster API postman jhipster keycloak

Postman 邮递员，坐在埃夫雷巴勒上；“运行会话”； postman

Postman 邮递员预期结果和实际结果相同，但断言仍然失败 postman

[python]相关推荐

Python 有序列约束置换
Python Algorithm

python旋转文件处理程序回调
Python Logging

使用python在debian中记录错误
Python Linux Logging Debian

Python Pandas中的数据类型与SQL中的数据类型不同的原因
Python Sqlite

Python 从groupby的结果添加新列
Python Pandas

Python 一维阵列的开关尺寸
Python Arrays Numpy

Python SQL炼金术-在外键可能不存在时指定关系
Python Sqlalchemy

Python——如何将标题/章节自动拆分为单独的文件
Python File

python类对象不可序列化
Python Json Pandas

Python discord.py-rewrite尝试将某些术语列入黑名单时，仅当其仅为黑名单术语时才起作用我的目标和一些背景信息
Python Python 3.x Discord Discord.py

将一个数组装入另一个数组，而不考虑大小-Python
Python Numpy

Python：使用子目录重命名目录
Python

Python 在后台删除py.exe
Python Python 3.x Tkinter

Python 如何重塑列表中的numpy数组并追加/扩展结果？
Python Numpy

Python 在seaborn jointplot上设置标题
Python Matplotlib

Python 如何从字典值匹配字符串和子字符串
Python

Python 将嵌套对象发布到Redis
Python

Python Plotly camera center（布局、场景、摄影机、中心）使用什么单位？
Python 3d

Python 用bs4抓取谷歌知识盒
Python Web Scraping

Python 当用户删除反应时删除不一致角色
Python Python 3.x Discord Discord.py

Python 基准测试安装错误，ModuleNotFoundError:没有名为'；主要'；
Python Pip Artificial Intelligence

Python 按最低价格和最高价格筛选和排序对象
Python Html Django Django Models

Python 灵敏度太低
Python Machine Learning Scikit Learn

Python ValueError:操作数无法与获取sklearn的BallTree邻居的平均距离的形状一起广播
Python Numpy Scikit Learn

Python 正在尝试编写代码，但出现错误：Indexer错误：字符串索引超出范围
Python

如何在python中多处理异步/等待函数？
Python Python 3.x

Python 如何避免浮点（in）精度影响舍入
Python Pandas

Python ActionChain click（）返回Web元素而不是单击
Python Selenium

Python '；QuerySet'；对象没有属性'；朋友'；
Python Django Django Models

Python 在一个.py中导入的库对于另一个.py的导入函数不起作用
Python

Tags

Junit Editor Configuration Streaming Shopify Localization Ios7 Keras Pandas Here Api Biztalk Prometheus Sms Snowflake Cloud Data Platform Xml Vbscript Content Management System Less Hibernate Sencha Touch Swift Dojo Odata Twitter Parallel Processing Sprite Kit Data Binding Interface Artifactory Elm Drupal Keyboard Cmd Abap Unicode Responsive Design Flutter Smtp Design Patterns Jvm Linq To Sql Nhibernate Jetty Java Me Serial Port Eclipse Plugin Robotframework Date .htaccess Graph Selenium Db2 Pascal Algorithm Dll Matplotlib Xpath Ruby On Rails 4 Blackberry Spring Boot Compilation Dependency Injection Machine Learning Clojure Graphql Powershell Project Management Next.js Download Gradle Prolog Teradata Web Crawler Stanford Nlp Moodle Continuous Integration Ravendb Titanium Spring Jsf 2 Coding Style Grafana Bluetooth Dotnetnuke Angular Formatting Events Twilio Knockout.js Tabs Sql Server 2012 X86 Model R Class Ldap Tfs Pip Reporting Services Rss Jhipster Firefox Addon Generics Grails Canvas Jupyter Notebook Winforms Teamcity Automated Tests Dictionary Xpages Meteor .net Oracle10g Visual Studio 2015 Asp.net Mvc Twitter Bootstrap 3 Snmp Javafx Google Chrome Devtools Breeze Jar Activemq Search Websocket Ruby On Rails 3.1 Debian Pentaho Api Pycharm Omnet++ Composer Php Swift2 Character Encoding Uiview Curl Kibana Hyperledger Fabric Jsp Oracle11g .net 4.0 Openshift 3d Functional Programming Autocomplete Paypal Opengl Maven Struts2 Terraform Nest Extjs4 Zend Framework Google Maps Api 3 Plugins Multithreading Symfony Dns Amazon Dynamodb Discord.py Compression Mpi Architecture Asp.net Core Url Computer Vision Silverstripe Angularjs Amazon Cloudformation Rx Java Process D File Upload Swift3 Doctrine Orm Azure Functions Julia Ecmascript 6 Visual Studio 2013 Hive Bison Google Visualization Activerecord Iphone Gwt Youtube Function Oauth 2.0 Web Scraping Odoo Ignite Merge Gatsby Node.js Azure Data Factory Grep Oracle Apex Jquery Mapping Push Notification C++11

Copyright © 2024. All Rights Reserved by - Fatal编程技术网