如何使用php将docx文档转换为html？_Php_Html_Docx

如何使用php将docx文档转换为html？

php html

如何使用php将docx文档转换为html？,php,html,docx,Php,Html,Docx,我希望能够上传MS word文档并将其导出到我的站点中的页面有没有办法做到这一点 //FUNCTION :: read a docx file and return the string function readDocx($filePath) { // Create new ZIP archive $zip = new ZipArchive; $dataFile = 'word/document.xml'; // Open received archive f

我希望能够上传MS word文档并将其导出到我的站点中的页面

有没有办法做到这一点

//FUNCTION :: read a docx file and return the string
function readDocx($filePath) {
    // Create new ZIP archive
    $zip = new ZipArchive;
    $dataFile = 'word/document.xml';
    // Open received archive file
    if (true === $zip->open($filePath)) {
        // If done, search for the data file in the archive
        if (($index = $zip->locateName($dataFile)) !== false) {
            // If found, read it to the string
            $data = $zip->getFromIndex($index);
            // Close archive file
            $zip->close();
            // Load XML from a string
            // Skip errors and warnings
            $xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
            // Return data without XML formatting tags

            $contents = explode('\n',strip_tags($xml->saveXML()));
            $text = '';
            foreach($contents as $i=>$content) {
                $text .= $contents[$i];
            }
            return $text;
        }
        $zip->close();
    }
    // In case of failure return empty string
    return "";
}

ZipArchive和DOMDocument都在PHP内部，因此您不需要安装/包括/需要其他库。

可以使用

它几乎支持所有HTML CSS样式。此外，您可以使用模板通过

replaceTemplateVariableByHTML

向HTML添加额外的格式

PHPDocX的HTML方法还允许直接使用单词样式。您可以使用类似以下内容：

$docx->embedHTML（$myHTML，array（'tableStyle'=>'MediumGrid3-accent5hpdocx'）
如果希望所有表都使用MediumGrid3-accent5单词样式。EmbeddeHTML方法及其模板版本（replaceTemplateVariableByHTML
）保留继承，这意味着您可以使用预定义的单词样式并使用CSS覆盖其任何属性
您还可以使用“JQuery类型”选择器提取HTML的选定部分。
您可以使用Print2flash库将Word docx文档转换为HTML。以下是我的客户网站上的PHP摘录，它将文档转换为html：
include("const.php");
$p2fServ = new COM("Print2Flash4.Server2");
$p2fServ->DefaultProfile->DocumentType=HTML5;
$p2fServ->ConvertFile($wordfile,$htmlFile);

它将$wordfile变量中指定路径的文档转换为$htmlFile变量指定的html页面文件。保留所有格式、超链接和图表。您可以从中获得所需的const.php文件和更完整的示例。
如果您不拒绝REST API，则可以使用：

。是经验证的OSS文本提取领导者
如果您不想为配置而烦恼，并且希望使用现成的解决方案，您可以使用，但这不是免费的

RawText的示例代码：
$result=$rawText->parse（$your_文件）
这是一个基于David Lin上述答案的解决方案
删除docx的xml标记中的“w:”会留下类似Html的标记
    function readDocx($filePath) {
    // Create new ZIP archive
    $zip = new ZipArchive;
    $dataFile = 'word/document.xml';
    // Open received archive file
    if (true === $zip->open($filePath)) {
        // If done, search for the data file in the archive
        if (($index = $zip->locateName($dataFile)) !== false) {
            // If found, read it to the string
            $data = $zip->getFromIndex($index);
            // Close archive file
            $zip->close();
            // Load XML from a string
            // Skip errors and warnings
            $xml = new DOMDocument("1.0", "utf-8");
            $xml->loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING|LIBXML_PARSEHUGE);
            $xml->encoding = "utf-8";
            // Return data without XML formatting tags
            $output =  $xml->saveXML();
            $output = str_replace("w:","",$output);

            return $output;
        }
        $zip->close();
    }
    // In case of failure return empty string
    return "";
}

谢谢，这很好，但是有没有办法保持粗体和斜体字的格式呢。。。它正在返回整个文档。但是有没有办法单独获取页面文本！这个答案并没有提供一个将.docx转换为HTML的解决方案——正如代码strip_tags（）
——OP特别询问如何转换为HTMLI我对php不是很熟悉——但也许这可以帮助你？你可以用它。你的方法是，你需要一个。然后遵循这一点，或者自己学习如何使用。phpLiveDocx似乎有些过头了……而且它在服务方面似乎非常有限（没有动态表或图表），需要说它不是免费的！至少没有了。建议：让我们为StackOverflow引入一个“商业”徽章/标记，使类似的东西可见