Jump to content
  • Advertisement
Sign in to follow this  
furiousp

Save web pages as text file

This topic is 4848 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, When i'm at a website i can save the current web page as a text file with my browser. I want to write a program that can do this automatically. Does anyone have any ideas on how i could go about doing this (I haven't written any programs that interact with webpages before)? I generally use Java but I have compilers for C++, C#, J#, and VB would any of those be better for this? Thanks

Share this post


Link to post
Share on other sites
Advertisement
Use any language you like.
And how to do it ? Just open the .html file, read it word by word, html tag by html tag and think of appromiate algorithm. Look for HTML standards to find how HTML tags work.

You could also just strip the tags, but you have to watch out for special ones, like scripts, html comments, image tags and such .. these should not be shown.

FYI you have quite a long way ahead of you in this task.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
What's the purpose? Is it for fun or learning? Otherwise I'd guess you can find tons of application that does this already.


I assume you can use the automation APIs for Internet Explorer, see IDM_SAVEAS.

http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/constants/saveas.asp

That way I assume you could save in .mht format directly, meaning frames, images, sound etc gets embedded in the same file.

I would personally do this from C#.


If you do everything from scratch you could use the HttpWebRequest in .Net, but you need to do lots of work your self, such as parsing html, loading frames, images etc. You'd also need to define how to save all this (a single file? Several files?)

Share this post


Link to post
Share on other sites
Thanks for the replies. I should have mentioned that all i want to do is take information from a table on a webpage, if i save the file as a text file in a browser it stores it as a tab dilimited text file, which is easy to process.

To be more exact, what i'm doing is taking information from the google public service search for each day in a particular month to get statistics for that month (what words or terms were searched for the most)

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by furiousp
Thanks for the replies. I should have mentioned that all i want to do is take information from a table on a webpage, if i save the file as a text file in a browser it stores it as a tab dilimited text file, which is easy to process.

To be more exact, what i'm doing is taking information from the google public service search for each day in a particular month to get statistics for that month (what words or terms were searched for the most)


Have you considered using the Google API? It gives you direct access to the query data so you don't have to parse out the information yourself.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!