How do I convert DOC and DOCX to TXT with PHP?

Asked

Viewed 439 times

7

I have a system where the files the client will send me are all on file DOC or DOCX. But the same wants to be possible to download this document in format TXT.

Is there any simple way to convert DOC or DOCX stop TXT through PHP?

  • 2

    Have you tried Phpword?

  • Precisely, @rray, I forgot to specify this in the question. You know how to do this in PHPWord?

  • 1

    I believe these two links can help you: http://stackoverflow.com/questions/19503653/how-to-extract-text-from-word-file-doc-docx-xlsx-pptx-php http://stackoverflow.com/questions/5540886/extract-text-from-doc-and-docx

  • here is another example: http://stackoverflow.com/questions/188452/reading-writing-a-ms-word-file-in-php

  • Thank you guys. I’m starting to get it. A file DOCX is a masked zipped file. If you change the extension of the same to ZIP, you will see that it has several files XML for formatting.

1 answer

3


I managed to solve the problem. I did it this way:

I open the WORD Document through the class IOFactory library PHPWord.

 $reader = PHPOffice\PhpWord\IOFactory::createReader('Word2007');

 $phpword = $reader->load('arquivo.docx');

Save the file as HTML in a temporary archive:

$tempfile = tempnam(sys_get_temp_dir());

$phpword->save($tempfile, 'HTML');

I use the class DomDocument to find only the tag body

$dom = new DomDocument('1.0', 'UTF-8');

@$dom->load($tempfile); // Essa arroba é normal ;)

$body = $dom->getElementsByTagName('body')->item(0)->nodeValue;

Then I do the schematic to format the HTML. I also configure it to display correctly in the notepad of Windows, exchanging "\n" for "\r\n".

 $txt = str_replace("\n", "\r\n", strip_tags($body));

 file_put_contents('arquivo.txt', $txt);

Browser other questions tagged

You are not signed in. Login or sign up in order to post.