Php 如何读取docx文件的文本,比如antiword?
谁知道呢,如何在php中读取file.docx和file.doc的antiword一样? 我对file.doc使用了antiword,并在DB中设置了文本Php 如何读取docx文件的文本,比如antiword?,php,text,Php,Text,谁知道呢,如何在php中读取file.docx和file.doc的antiword一样? 我对file.doc使用了antiword,并在DB中设置了文本 $em = $this->getDoctrine()->getManager(); $request = $this->get('request'); $developer = $em->getRepository('ProfileBundle:Developer')->findOneBy
$em = $this->getDoctrine()->getManager();
$request = $this->get('request');
$developer = $em->getRepository('ProfileBundle:Developer')->findOneById($id);
if (! $developer) {
throw $this->createNotFoundException('Unable to find a profile.');
}
$cv = $developer->getCvDirUri();
if($cv && file_exists($cv)) {
unlink($cv);
}
$form = $this->createForm(new DeveloperDirCvType(), array());
if ($request->isMethod('POST')) {
$form->bind($request);
if ($form->isValid()) {
$data = $form->getData();
$uploader = $this->get('artel.profile.file_uploader');
$path = $uploader->uploadFile($data['photo']);
$developer->setCvDirUri($path['url']);
$content = shell_exec('/usr/bin/antiword '.'chmod o+r /var/www/aog-profile/web/'.$path['url']);
if ($data['photo']->getClientMimeType() == 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') {
$content_txt = exec('/usr/bin/abiword --to=html '.'/var/www/aog-profile/web/'.$path['url']);
}
elseif ($data['photo']->getClientMimeType() == 'application/pdf') {
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('/var/www/aog-profile/web/'.$path['url']);
$content = $pdf->getText();
}
else{
$content = shell_exec('/usr/bin/antiword -m UTF-8.txt '.'chmod o+r /var/www/aog-profile/web/'.$path['url']);
}
$url = sprintf(
'%s%s',
$this->container->getParameter('acme_storage.amazon_s3.base_url'),
$this->getPhotoUploader()->uploadFromUrl($path['url'])
);
$developer->setTextCv($content);
$developer->setCvUri($url);
$em->flush();
如果file.doc我使用了antiword和setTextCv($content),我在DB中有文本,我在amazon中上传,但是
如果此文件是docx,我将docx文件上载到/upload/Cv/file.docx中,并创建file.html。然后我需要setTextCv('text in file html'),或者如果您知道另一种方法?我不知道它是怎么做的。有什么想法吗?你可以试着解压它,然后在文件结构中翻找。如何解压?我不知道首先,你需要学习
.docx
文件的结构。它只不过是一个.zip
文件。您可以在PHP文档中查看Zip类。如果你只想阅读文本,你可以分散在word\socument.xml文件中.docx
文件中。我编辑文章,并添加描述Tanks man,我不尝试,但我写代码,这段代码在docx文件中工作