如何使用JavaScript从Word文档中提取图像？_Javascript_Activexobject

如何使用JavaScript从Word文档中提取图像？

javascript

如何使用JavaScript从Word文档中提取图像？,javascript,activexobject,Javascript,Activexobject,我正在尝试使用JavaScript中的ActiveXObject（仅限IE）从Word文档中提取图像我找不到Word对象的任何API参考，只有来自Internet的一些提示： var filename = 'path/to/word/doc.docx' var word = new ActiveXObject('Word.Application') var doc = w.Documents.Open(filename) // Displays the text var docText = d

我正在尝试使用JavaScript中的ActiveXObject（仅限IE）从Word文档中提取图像

我找不到Word对象的任何API参考，只有来自Internet的一些提示：

var filename = 'path/to/word/doc.docx'
var word = new ActiveXObject('Word.Application')
var doc = w.Documents.Open(filename)
// Displays the text
var docText = doc.Content

如何使用

doc.Content

之类的内容访问Word文档中的图像

另外，如果有人有一个明确的API来源（最好是微软的），那将非常有用。

因此经过几周的研究，我发现使用ActiveXObject中的

SaveAs

函数提取图像是最容易的。如果文件保存为HTML文档，Word将创建一个包含图像的文件夹

从那里，您可以使用XMLHttp获取HTML文件并创建新的IMG标记，这些标记可以被浏览器查看（我使用IE（9），因为ActiveXObject只能在Internet Explorer中工作）

让我们从

SaveAs

部分开始：

// Define the path to the file
var filepath = 'path/to/the/word/doc.docx'
// Make a new ActiveXWord application
var word = new ActiveXObject('Word.Application')
// Open the document
var doc = word.Documents.Open(filepath)
// Save the DOCX as an HTML file (the 8 specifies you want to save it as an HTML document)
doc.SaveAs(filepath + '.htm', 8)

现在，我们应该在同一个目录中有一个文件夹，其中包含图像文件

注意：在单词HTML中，图像使用

标记，这些标记存储在

标记中；例如：

<v:shape style="width: 241.5pt; height: 71.25pt;">
     <v:imagedata src="path/to/the/word/doc.docx_files/image001.png">
         ...
     </v:imagedata>
</v:shape>

因为我正在访问数百个Word文档，我发现最好在发送调用之前定义XMLHttp的

onreadystatechange

callback

// Define the onreadystatechange callback function xmlhttp.onreadystatechange = function() { // Check to make sure the response has fully loaded if (xmlhttp.readyState==4 && xmlhttp.status==200) { // Grab the response text var html_text=xmlhttp.responseText // Load the HTML into the innerHTML of a DIV to add the HTML to the DOM document.getElementById('doc_html').innerHTML=html_text.replace("<html>", "").replace("</html>","") // Define a new array of all HTML elements with the "v:imagedata" tag var images =document.getElementById('doc_html').getElementsByTagName("v:imagedata") // Loop through each image for(j=0;j<images.length;j++) { // Grab the source attribute to get the image name var src = images[j].getAttribute('src') // Check to make sure the image has a 'src' attribute if(src!=undefined) { ...
在这里，我们使用父对象（即
v:shape
对象）的父对象（恰好是
p
对象）向HTML div添加一个新的
img
标记。我们通过从图像中获取
src
属性和
v:shape
元素中获取
style
信息，将新的
img
标记附加到innerHTML：

... images[j].parentElement.parentElement.innerHTML+="<img src='"+images[j].getAttribute('src')+"' style='"+images[j].parentElement.getAttribute('style')+"'>" } } } } // Read the HTML Document using XMLHttpRequest xmlhttp.open("POST", filepath + '.htm', false) xmlhttp.send()

。。。图像[j].parentElement.parentElement.innerHTML+=“” } } } } //使用XMLHttpRequest读取HTML文档 xmlhttp.open（“POST”，filepath+'.htm'，false） xmlhttp.send（）

虽然有点特殊，但上述方法能够成功地将img标记添加到原始文档中的HTML中。
Ken，感谢您提供此链接！我知道它就在那里的某个地方，但我一生都找不到它。我看看能不能找到这个问题的答案。
... images[j].setAttribute('src', '/path/to/the/folder/containing/the/images/'+src.split('/')[1]) ...

... images[j].parentElement.parentElement.innerHTML+="<img src='"+images[j].getAttribute('src')+"' style='"+images[j].parentElement.getAttribute('style')+"'>" } } } } // Read the HTML Document using XMLHttpRequest xmlhttp.open("POST", filepath + '.htm', false) xmlhttp.send()