Microsoft Word is notorious for inserting a massive amount of non-standard, useless HTML markup in pages when you "Save as a Webpage…" Not only does this extra markup violate all validation and kill page load times, but it also makes editing pages by hand nearly impossible.
Thankfully, Microsoft added a little know feature into Word XP or later that allows you to omit much of the bloated code when you save as a webpage. Simply click File | Save As…, then click the Save As Type drop-down box and choose Web Page, Filtered.
![Microsoft Word XP Save as Webpage, Filtered screenshot](/UserFiles/Image/save_as_webpage_filtered_sc.png)
Microsoft Office XP Save As…Web Page, Filtered screenshot
While this doesn’t create perfect code, it does cut down of filesize considerably, according to my simple test. I opened a two page Word document with text, font formatting, tabs, bulleted lists, and numbered lists. The filtered document was only 5KB, while the normal Save as a Web Page document was 12KB. Looking at the generated HTML in notepad, it’s easy to see why. The header of the non-filtered page had 40 lines of useless XML content. Also, the bulleted lists were extremely bloated in the non-filtered page. It’s probably safe to say those size difference ratios wouldn’t hold true for longer documents since much of the bloat in the test documents was a result of the HEAD content, however, the general markup was much more terse in the filtered version.
From now on, when I want to quickly publish a Word document, I will definately choose Save As…Web Page, Filtered.
Leave a Reply