Jump to content
  • Advertisement
Sign in to follow this  
Huangdi

Winsock help

This topic is 5401 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

OK, trying to design a program using C++ to do this: 1. User inputs a search phrase, # of results 2. Program (Harvester) searches for the top results, downloads all urls into a file (temp.txt). The number of urls is <= the # of results specified. 3. Harvester then goes to the websites listed and downloads the html, storing it in a master html file (html.txt). I've discovered that I need to use winsock, but not too sure how to go about doing it. I think i know how to connect to a server, but how do I go to a website? This is what I have so far: 1. Input user data. Let the search terms = the string "terms". 2. Go to "http://www.google.com/search?hl=en&ie=UTF-8&q=" + terms; 3. Parse the html for the urls. 4. If the number of URLs is less than the # of urls needed, go to the next page: "http://www.google.com/search?q=" + terms + "&hl=en&lr=&ie=UTF-8&safe=off&start=" + counter; counter is 1, 11, 21, 31, etc for each page 4. ofstream the urls to temp.txt 5. read the urls, go to the url, and download. 6. repeat 4-5 I need to know the key info: how to get the html!!! Can someone help by writing the basic frame? Any type of help is greatly appreciated! Thanks!

Share this post


Link to post
Share on other sites
Advertisement
look up the HTTP protocol on google. Browsers don't just send addresses and recieve HTML code. There is quite a bit of extra information that is passed back and forth.

Also, you connect to websites on port 80. If you create a small http server on you local machine, and connect to it with a regular internet browser, as well as use your Harvester program to connect to a real website, you can print out the header information that is being transmitted, and then copy+and+paste it back and forther between your mock-server (trying to get it to look like a real server to your regular internet browser, just to make sure your mock-server is fully testing your Harvester) and your Harvester client (trying to get it to look like a real browser to the server). This is essentially how I wrote an IRC client in one day, I read the IRC protocol documentation, and then reverse engineered everything.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!