Sign in to follow this  

Converting .doc to .html

This topic is 4338 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey, I'm wondering if any of you know how .doc (word) files are "formated"? More precisely, how could I take the styles from a .doc and convert them into css? I googled the thread's title to find a lot of pages but they all seem to talk about using MWWord to do the conversion. Basically I'd like to load the myDoc.doc in my program and have it return myDoc.html with valid html. This is for a newsletter written at work. I'd like to make my job easier by having the conversion done automatically instead of copy pasting the text to then format it using html tags. (if there's a "proper" term for this let me know and I could probably do the research on my own... I don't really know what to call this since I've never done something like this before.) Thanks, Seb

Share this post


Link to post
Share on other sites
You could try opening it in Microsoft Word it self, and Saving As .html
Although this probably doesn't use CSS (I haven't tried recently), it will usually give a decent translation.
As for reading .doc files, try googling for the format, it is well documented.

Share this post


Link to post
Share on other sites
Quote:
Original post by swiftcoder
You could try opening it in Microsoft Word it self, and Saving As .html
Although this probably doesn't use CSS (I haven't tried recently), it will usually give a decent translation.[...]
Whatever you do, don't do this. It will generate HTML that is around 100 times larger than the text itself, and everything is specified using absolute units so that anybody that needs it larger or smaller is screwed. Changing the font size by editing the HTML isn't an option because it's VERY messy html (that is filled with non-standard tags), even 'tidy' (a program designed to clean up HTML files, with an option to deal with words junk) chokes on HTML generated by anything newer then word 2000.

Share this post


Link to post
Share on other sites
Copy->Paste, reproduce format.

But if you can't do it manually, OO.o does the best job of anything I've found (and yes, I've had to look; would you BELIEVE a company would force me to produce webpages from .docs automatically when I'm a fully qualified web developer?), but I still recommend delving into the generated code and removing some of the fat.

Share this post


Link to post
Share on other sites

This topic is 4338 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this