Jump to content
  • Advertisement
Sign in to follow this  
Sfpiano

Connecting to a website to obtain it's html

This topic is 4851 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm not quite sure how to get html from a website. I tried:
WORD sockVersion;
	WSADATA wsaData;
	int nret;

	sockVersion = MAKEWORD(1, 1);

	WSAStartup(sockVersion, &wsaData);

	LPHOSTENT hostEntry;

	hostEntry = gethostbyname("www.google.com");

	SOCKET theSocket;

	theSocket = socket(AF_INET,
		SOCK_STREAM,	
		IPPROTO_TCP);

	SOCKADDR_IN serverInfo;

	serverInfo.sin_family = AF_INET;

	serverInfo.sin_addr = *((LPIN_ADDR)*hostEntry->h_addr_list);
	serverInfo.sin_port = htons(80);

	connect(theSocket,
		(LPSOCKADDR)&serverInfo,
		sizeof(struct sockaddr));

char buffer[1024];

ZeroMemory(buffer, 1024);
strcpy(buffer, "GET /robots.txt HTTP/1.1");

send(theSocket,
	buffer,
	strlen(buffer),
	0);

...

ZeroMemory(buffer, 1024);
recv(theSocket, buffer, 1024, 0);
But my program hangs on the receive call. (I have error checks in my code, but I removed them to save space.) [Edited by - Sfpiano on August 8, 2005 1:12:31 AM]

Share this post


Link to post
Share on other sites
Advertisement
You need a cr/lf at the end of your request string. That may be the problem.

Also, you should make sure your whole request is sent. send isn't guaranteed to send everything you tell it to.

[Edited by - wendigo23 on August 8, 2005 1:16:10 AM]

Share this post


Link to post
Share on other sites
send and recv return the number of bytes they sent or recieved. Make sure it matches what you expect, if it doesn't match then send the remainder.

int sent = 0;
while(sent != strlen(buffer))
int ret = send(socket, buffer + sent, strlen(buffer) - sent, 0)
if(ret == -1)
error!
else
sent += ret;


Try something like:
strcpy(buffer, "GET /robots.txt HTTP/1.1\n");
The \n is (on windows systems) a carriage return (cr) and a line feed (lf).

Share this post


Link to post
Share on other sites
Still hangs on the recv call. Could it be that my socket is still blocking after the send call for some reason?

Share this post


Link to post
Share on other sites
I forgot you need to have two \n's at the end of the request. There may be multiple lines in a request, so the empty line tells the server your'e done.

"GET / HTTP/1.0\n\n"

Share this post


Link to post
Share on other sites
How do you keep reading until you've got everything you need? Like:

while( not end of file ) {
read;
if( file done )
break;
}

How to you deternmine where the end of the file is?

Share this post


Link to post
Share on other sites
I saw that, but his code is not the easiest to follow. Would it be satisfactory to go until I reach </html>?

Share this post


Link to post
Share on other sites
You need to send a well-formed HTTP request. This typically includes adding some headers to the request. When you get the data, you typically have to read and parse the headers (which may be an error, or may describe the data), and then read the data amount as described by the headers. Look for Content-Length: and parse out the amount of non-header data to read.

I'm sorry you found my code hard to follow; if there's something you don't understand, please ask.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!