Jump to content
  • Advertisement
Sign in to follow this  
vlzvl

download HTML via Winsock

This topic is 3723 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

hello, im the author of Jolt3D! engine (jolt-3d.sf.net) im trying to create an alternative way to 'InternetReadFile()' func of WININET.DLL, to read HTML pages from C programs. My goal anyway is to get rid of the above dll & just use the wsock dll, but i hit a problem...The problem is that with the following winsock code im not getting correct results, or i dont know well the "GET" syntax. The (easy) code is:
	// address
	IN_ADDR		iaHost;
	LPHOSTENT	lpHostEntry;
	iaHost.s_addr = inet_addr(Servername);
	if (iaHost.s_addr == INADDR_NONE) // Wasn't an IP address string, assume it is a name		
		lpHostEntry = gethostbyname(Servername);
	else // It was a valid IP address string
		lpHostEntry = gethostbyaddr((const char *)&iaHost, sizeof(struct in_addr), AF_INET);
	if (lpHostEntry == NULL)
	{
		J_event("error A");
		return;
	}
	// socket
	SOCKET	Socket;	
	Socket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (Socket == INVALID_SOCKET)
	{
		J_event("error B");
                return;
        }
	// port
	LPSERVENT lpServEnt;
	SOCKADDR_IN saServer;
	lpServEnt = getservbyname("http", "tcp");
	if (lpServEnt == NULL)
		saServer.sin_port = htons(80);
	else
		saServer.sin_port = lpServEnt->s_port;
	// fill rest
	saServer.sin_family = AF_INET;
	saServer.sin_addr = *((LPIN_ADDR)*lpHostEntry->h_addr_list);
	// connect
	int nRet = connect(Socket, (LPSOCKADDR)&saServer, sizeof(SOCKADDR_IN));
	if (nRet == SOCKET_ERROR)
        {
		J_event("error C");
                return;
        }
	// build the HTTP request
	char szBuffer[1024];
	sprintf(szBuffer, "GET %s\n", Filename);
	nRet = send(Socket, szBuffer, strlen(szBuffer), 0);
	if (nRet == SOCKET_ERROR)
        {
		J_event("error D");
                closesocket(Socket);
                return;
        }
	// receive the file contents and print to local 'index.html'
	FILE *f=fopen("index.html","wb");
	while(1)
	{
		nRet = recv(Socket, szBuffer, sizeof(szBuffer), 0);
		if (nRet == SOCKET_ERROR)
		{
			J_event("error E");
			break;
		}
		else if (nRet == 0) // server closes connection ? (or just there'arent bytes to read)
			break;
        fwrite(szBuffer, nRet, 1, f); // write to file
	}
	closesocket(Socket);	
	fclose(f);
...where the 'Servername' is either an IP or domain name etc. www.google.com, and 'Filename' a specific HTML file with its directory etc. /files/index.html As you see im using the most easiest GET syntax: "GET %s\r\n", no HTTP/1.1 or Host: or anythine else... This syntax works for www.google.com/index.html, but not with: http://www.nba.com/games/20071030/scoreboard.html // request timed out http://jolt-3d.sf.net/index.htm // error 400, bad URI ??? My questions are: 1) is those errors have to do with bad parameters after GET ? 2) why when im using "GET %s HTTP/1.1\r\n" the system halts? (or any version) 3) my internet connection is not ADSL, but GPRS / 3G. Maybe winsock is confused somewhat with this ? P.S. I found something odd with explorer & my code. I tried to download a file (that didnt exists) from my site (in sourceforge) with both ways: the explorer returned with the known sourceforge error page which shows the error code, the server & url, all filled normally: My winsock code HADNT filled the server name, and the url was somewhat formatted with %1/%%3 etc, more specically was: /home/groups/%1%%2/htdocs/ff.htm, instead of the correct /home/groups/j/jo/jolt-3d/htdocs/ff.htm What is going on? If anyone has some time pls check this code, i think it will help anyone that wants to download "freely" an HTML page without grab his hands into commercial products. thanx

Share this post


Link to post
Share on other sites
Advertisement
nice library,
although it doesnt handle re-direction &
other things, is far better than my code :)
thanx

Share this post


Link to post
Share on other sites
Yes, it's somewhat minimal :-)

However, it should be not too hard to put re-direct parsing, cookies, and whatever else you need on top of what's there. The networking and request/response part works fairly well.

Share this post


Link to post
Share on other sites
...i suppose that is a library of yours (i saw the ~hplus directory :) )
really nice work ! Just one more question: is there a way to bypass
the header-like text before the actual html page ?
Im using a number of html pages from my c programs in real-time & doing parsing byte-2-byte, so i know (and need) the same byte-offsets for several of these pages; but with the header things (and offsets :) ) are changing...
I must start thinking where the <HTML> starts or is there an easier way ?
my thanx

Share this post


Link to post
Share on other sites
Quote:
Original post by vlzvl
...i suppose that is a library of yours (i saw the ~hplus directory :) )
really nice work ! Just one more question: is there a way to bypass
the header-like text before the actual html page ?
Im using a number of html pages from my c programs in real-time & doing parsing byte-2-byte, so i know (and need) the same byte-offsets for several of these pages; but with the header things (and offsets :) ) are changing...
I must start thinking where the <HTML> starts or is there an easier way ?
my thanx


HTTP headers are fixed. You can send minimal subset, but it needs to conform to specification.

HTTP supports partial GET requests. They need to be supported by server. Some do not support it, and some deliberately disable it.

Share this post


Link to post
Share on other sites
The headers end after the character sequence "\r\n\r\n" (CR, LF, CR, LF). That character sequence cannot be part of the header. Thus, you can just look for that sequence, and when you find it, you know that the data starts with the very next byte. That may or may not be "<HTML>" by the way -- it could be "<html>," or "<?xml>," or "<!DOCTYPE>," or "," or some extra blanks inserted by whomever generated the page.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!