无法使用PHP和DOM从Word DOCX提取邮件合并字段
我正试图想出一个解决方案,允许用户上传一个支持邮件合并的Word DOCX模板文件。理想情况下,系统将读取DOCX文件,提取XML,找到邮件合并字段,并将它们保存到数据库中,以便将来进行映射。我可能会使用诸如Zend LiveDocX或PHPDOCX之类的SOAP服务,或者完全其他的服务,但是现在我需要弄清楚如何识别DOCX文件中的字段。为此,我从本文开始: 我已经对它进行了一些调整,以适应我的需要,这可能是一个问题,尽管我在原始代码中也遇到了同样的错误。具体来说,我现在不使用它来执行邮件合并,我只想标识字段。以下是我得到的:无法使用PHP和DOM从Word DOCX提取邮件合并字段,php,xml,ms-word,docx,Php,Xml,Ms Word,Docx,我正试图想出一个解决方案,允许用户上传一个支持邮件合并的Word DOCX模板文件。理想情况下,系统将读取DOCX文件,提取XML,找到邮件合并字段,并将它们保存到数据库中,以便将来进行映射。我可能会使用诸如Zend LiveDocX或PHPDOCX之类的SOAP服务,或者完全其他的服务,但是现在我需要弄清楚如何识别DOCX文件中的字段。为此,我从本文开始: 我已经对它进行了一些调整,以适应我的需要,这可能是一个问题,尽管我在原始代码中也遇到了同样的错误。具体来说,我现在不使用它来执行邮件合并,
$newFile = '/var/www/mysite.com/public_html/template.docx';
$zip = new ZipArchive();
if( $zip->open( $newFile, ZIPARCHIVE::CHECKCONS ) !== TRUE ) { echo 'failed to open template'; exit; }
$file = 'word/document.xml';
$data = $zip->getFromName( $file );
$zip->close();
$doc = new DOMDocument();
$doc->loadXML( $data );
$wts = $doc->getElementsByTagNameNS('http://schemas.openxmlformats.org/wordprocessingml/2006/main', 'fldChar');
$mergefields = array();
function getMailMerge(&$wts, $index) {
$loop = true;
$counter = $index;
$startfield = false;
while ($loop) {
if ($wts->item($counter)->attributes->item(0)->nodeName == 'w:fldCharType') {
$nodeName = '';
$nodeValue = '';
switch ($wts->item($counter)->attributes->item(0)->nodeValue) {
case 'begin':
if ($startfield) {
$counter = getMailMerge($wts, $counter);
}
$startfield = true;
if ($wts->item($counter)->parentNode->nextSibling) {
$nodeName = $wts->item($counter)->parentNode->nextSibling->childNodes->item(1)->nodeName;
$nodeValue = $wts->item($counter)->parentNode->nextSibling->childNodes->item(1)->nodeValue;
}
else {
// No sibling
// check next node
$nodeName = $wts->item($counter + 1)->parentNode->previousSibling->childNodes->item(1)->nodeName;
$nodeValue = $wts->item($counter + 1)->parentNode->previousSibling->childNodes->item(1)->nodeValue;
}
if (substr($nodeValue, 0, 11) == ' MERGEFIELD') {
$mergefields[] = strtolower(str_replace('"', '', trim(substr($nodeValue, 12))));
}
$counter++;
break;
case 'separate':
$counter++;
break;
case 'end':
if ($startfield) {
$startfield = false;
}
$loop = false;
}
}
}
return $counter;
}
for ($x = 0; $x < $wts->length; $x++) {
if ($wts->item($x)->attributes->item(0)->nodeName == 'w:fldCharType' && $wts->item($x)->attributes->item(0)->nodeValue == 'begin') {
$newcount = getMailMerge($wts, $x);
$x = $newcount;
}
}
谷歌在试图找出这个错误时让我失望了,有人能给我指出正确的方向吗?提前谢谢 找到了一个解决方案——它没有我所希望的那么优雅,但现在开始了 使用xml解析器创建,我可以在DOCX文件中搜索我需要的键,特别是HTTP://SCHEMAS.OPENXMLFORMATS.ORG/WORDPROCESSINGML/2006/MAIN:instrext,它标识标记为MERGEFIELD的所有字段。然后我可以将结果转储到数组中,并使用它们更新数据库。也就是说:
// Word file to be opened
$newFile = '/var/www/mysite.com/public_html/template.docx';
// Extract the document.xml file from the DOCX archive
$zip = new ZipArchive();
if( $zip->open( $newFile, ZIPARCHIVE::CHECKCONS ) !== TRUE ) { echo 'failed to open template'; exit; }
$file = 'word/document.xml';
$data = $zip->getFromName( $file );
$zip->close();
// Create the XML parser and create an array of the results
$parser = xml_parser_create_ns();
xml_parse_into_struct($parser, $data, $vals, $index);
xml_parser_free($parser);
// Cycle the index array looking for the important key and save those items to another array
foreach ($index as $key => $indexitem) {
if ($key == 'HTTP://SCHEMAS.OPENXMLFORMATS.ORG/WORDPROCESSINGML/2006/MAIN:INSTRTEXT') {
$found = $indexitem;
break;
}
}
// Cycle *that* array looking for "MERGEFIELD" and grab the field name to yet another array
// Make sure to check for duplicates since fields may be re-used
if ($found) {
$mergefields = array();
foreach ($found as $field) {
if (!in_array(strtolower(trim(substr($vals[$field]['value'], 12))), $mergefields)) {
$mergefields[] = strtolower(trim(substr($vals[$field]['value'], 12)));
}
}
}
// View the fruits of your labor
print_r($mergefields);
使用相同的脚本,发现它必须包含许多子节点$nodeName=$wts->item$counter->parentNode->nextSibling->nodeName;
// Word file to be opened
$newFile = '/var/www/mysite.com/public_html/template.docx';
// Extract the document.xml file from the DOCX archive
$zip = new ZipArchive();
if( $zip->open( $newFile, ZIPARCHIVE::CHECKCONS ) !== TRUE ) { echo 'failed to open template'; exit; }
$file = 'word/document.xml';
$data = $zip->getFromName( $file );
$zip->close();
// Create the XML parser and create an array of the results
$parser = xml_parser_create_ns();
xml_parse_into_struct($parser, $data, $vals, $index);
xml_parser_free($parser);
// Cycle the index array looking for the important key and save those items to another array
foreach ($index as $key => $indexitem) {
if ($key == 'HTTP://SCHEMAS.OPENXMLFORMATS.ORG/WORDPROCESSINGML/2006/MAIN:INSTRTEXT') {
$found = $indexitem;
break;
}
}
// Cycle *that* array looking for "MERGEFIELD" and grab the field name to yet another array
// Make sure to check for duplicates since fields may be re-used
if ($found) {
$mergefields = array();
foreach ($found as $field) {
if (!in_array(strtolower(trim(substr($vals[$field]['value'], 12))), $mergefields)) {
$mergefields[] = strtolower(trim(substr($vals[$field]['value'], 12)));
}
}
}
// View the fruits of your labor
print_r($mergefields);