How to remove the extra HTML code from your Microsoft Word documents that are saved as a webpage

Microsoft Word is notorious for inserting a massive amount of non-standard, useless HTML markup in pages when you "Save as a Webpage…" Not only does this extra markup violate all validation and kill page load times, but it also makes editing pages by hand nearly impossible.

Thankfully, Microsoft added a little know feature into Word XP or later that allows you to omit much of the bloated code when you save as a webpage. Simply click File | Save As…, then click the Save As Type drop-down box and choose Web Page, Filtered.

Microsoft Word XP Save as Webpage, Filtered screenshot
Microsoft Office XP Save As…Web Page, Filtered screenshot

While this doesn’t create perfect code, it does cut down of filesize considerably, according to my simple test. I opened a two page Word document with text, font formatting, tabs, bulleted lists, and numbered lists. The filtered document was only 5KB, while the normal Save as a Web Page document was 12KB. Looking at the generated HTML in notepad, it’s easy to see why. The header of the non-filtered page had 40 lines of useless XML content. Also, the bulleted lists were extremely bloated in the non-filtered page. It’s probably safe to say those size difference ratios wouldn’t hold true for longer documents since much of the bloat in the test documents was a result of the HEAD content, however, the general markup was much more terse in the filtered version.

From now on, when I want to quickly publish a Word document, I will definately choose Save As…Web Page, Filtered.

Leave a Reply

Your email address will not be published. Required fields are marked *