Jump to content
  • Advertisement
Sign in to follow this  
Sagar_Indurkhya

getting source of webpages

This topic is 5477 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi. I am working on this program. I need to pull the source of webpages of the internet, given a specific webpage url. I am using the google web api, and am using java. I was leaning towards MS C++ .Net, but then people couldn't run it on linux. Any pointers are great cuz i am not very good at webprogramming. :)

Share this post


Link to post
Share on other sites
Advertisement
I don't know about the Google web API, but its extremely easy with a bit of socket programming. Heres what I use (its in C++, but its the concept that matters, not the code):

// Connect to web server //
if(!m_theSock.Connect("www.google.com",80))
{
m_strError = m_theSock.GetError();
return;
}

// Request file //
strBuff = "GET /index.html HTTP/1.1\r\n"
"User-Agent: Bleh\r\n"
"Host: www.google.com\r\n"
"Connection: Close\r\n\r\n";
if(!m_theSock.Send(strBuff))
{
m_strError = m_theSock.GetError();
m_theSock.Disconnect();
return;
}

// Wait for reply //
while(m_theSock.IsValid())
{
m_theSock.Receive();
}
m_theSock.Disconnect();



Basically, you connect to the server, send a HTTP request, and then read what it returns. You'll need to ignore everything before the first "\r\n\r\n" string, since thats the HTTP header. You should really parse that to check e.g. the HTTP response code (should be 200 for "Ok"). The server will close the connection once its sent all of the data.

Share this post


Link to post
Share on other sites
I wrote a Java library that handles the functionality that you are looking for. You can download it from my site at:
codeforge.org

I have only tested it under Linux, but being Java it should work anywhere.

Dave

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Wininet is very simple to use and very good. It can't do alot, but getting the web page source is very easy with it. There are tutorials and such out there.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!