Office

Office Tip: How to Extract Embedded Images from a Word Document

office-tip-hero

In today’s Ask the Admin, I have a tip for quickly extracting all the images from a Microsoft Word document.

Images embedded in Word documents sometimes need to be removed but saved for later. It may be because you prefer to work with just a text document and enter placeholders for the images, or that before publishing a document, via WordPress for example, manually pasting the document into the WordPress editor requires that images be removed and uploaded separately. Word does allow posts to be published directly to WordPress but doesn’t offer all the features of the WordPress web-based GUI.

Note that the instructions in this article refer to Office 2016 running on Windows 10.

Sponsored Content

Passwords Haven’t Disappeared Yet

123456. Qwerty. Iloveyou. No, these are not exercises for people who are brand new to typing. Shockingly, they are among the most common passwords that end users choose in 2021. Research has found that the average business user must manually type out, or copy/paste, the credentials to 154 websites per month. We repeatedly got one question that surprised us: “Why would I ever trust a third party with control of my network?

Zip Extraction Method

If you are using a newer version of Word – Office 2007 or later – the default file format is Office Open XML (OOXML), which is essentially a zip file that also contains the images embedded into the document. If you are working with a different file format in Word, you’ll need to select Save As from the File menu and save the document as a Word Document (*.docx) before you can extract images.

Once the file is saved in *.docx format, all you need to do is change the file extension from *.docx to *.zip.

  • Make sure the file is not open in Word, locate the file in File Explorer (WIN+E) and make sure that you can see the file’s extension.
  • If file extensions are not visible, click View on the ribbon and check File name extensions. Select the Word document in File Explorer and press F2 to rename it.
  • Select ‘.docx’ and replace it with ‘.zip’.
  • Now double-click the zip file to open the archive, open the word folder and then the media folder. This is where the embedded images are located. You can cut and paste them to a different folder.
  • Set the file extension back to .*docx once you’ve extracted the required images.
Extracting embedded images from a Word document (Image Credit: Russell Smith)
Extracting embedded images from a Word document (Image Credit: Russell Smith)

Save as HTML Method

The second method I’m going to show you works in any version of Word that supports saving files as a web page.

  • Open the file in Word.
  • Click File and select Save As from the menu.
  • In the Save As dialog, change Save as type to Web Page (*.htm,*html).
  • Save the file to the desired location by clicking Save.
Extracting embedded images from a Word document (Image Credit: Russell Smith)
Extracting embedded images from a Word document (Image Credit: Russell Smith)

In the location where the file was saved, you’ll see the html document and a folder with the same name of the saved html document but with _images appended or _files appended. This is where you’ll find all the embedded images. The images are numbered in order, but in some cases, Word will export two versions of the same image with different size dimensions (for example, image_003.png with dimensions 710×222 and image_004.png with dimensions 1065×333) might be two versions of the same image from Word) or even two file formats (for example, image_001.png and image_002.jpg might be two version of the same image from Word).

Related Topics:

BECOME A PETRI MEMBER:

Don't have a login but want to join the conversation? Sign up for a Petri Account

Register
Comments (0)

Leave a Reply

IT consultant, Contributing Editor @PetriFeed, and trainer @Pluralsight. All about Microsoft, Office 365, Azure, and Windows Server.
Don't leave your business open to attack! Come learn how to protect your AD in this FREE masterclass!REGISTER NOW - Thursday, December 2, 2021 @ 1 pm ET

Active Directory (AD) is leveraged by over 90% of enterprises worldwide as the authentication and authorization hub of their IT infrastructure—but its inherent complexity leaves it prone to misconfigurations that can allow attackers to slip into your network and wreak havoc. 

Join this session with Microsoft MVP and MCT Sander Berkouwer, who will explore:

  • Whether you should upgrade your domain controllers to Windows Server
    2019 and beyond
  • Achieving mission impossible: updating DCs within 48 hours
  • How to disable legacy protocols and outdated compatibility options in
    Active Directory

Sponsored by: