Jump to content
• Advertisement

download HTML via Winsock

This topic is 3809 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

hello, im the author of Jolt3D! engine (jolt-3d.sf.net) im trying to create an alternative way to 'InternetReadFile()' func of WININET.DLL, to read HTML pages from C programs. My goal anyway is to get rid of the above dll & just use the wsock dll, but i hit a problem...The problem is that with the following winsock code im not getting correct results, or i dont know well the "GET" syntax. The (easy) code is:
	// address
IN_ADDR		iaHost;
LPHOSTENT	lpHostEntry;
iaHost.s_addr = inet_addr(Servername);
if (iaHost.s_addr == INADDR_NONE) // Wasn't an IP address string, assume it is a name
lpHostEntry = gethostbyname(Servername);
else // It was a valid IP address string
lpHostEntry = gethostbyaddr((const char *)&iaHost, sizeof(struct in_addr), AF_INET);
if (lpHostEntry == NULL)
{
J_event("error A");
return;
}
// socket
SOCKET	Socket;
Socket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (Socket == INVALID_SOCKET)
{
J_event("error B");
return;
}
// port
LPSERVENT lpServEnt;
SOCKADDR_IN saServer;
lpServEnt = getservbyname("http", "tcp");
if (lpServEnt == NULL)
saServer.sin_port = htons(80);
else
saServer.sin_port = lpServEnt->s_port;
// fill rest
saServer.sin_family = AF_INET;
saServer.sin_addr = *((LPIN_ADDR)*lpHostEntry->h_addr_list);
// connect
int nRet = connect(Socket, (LPSOCKADDR)&saServer, sizeof(SOCKADDR_IN));
if (nRet == SOCKET_ERROR)
{
J_event("error C");
return;
}
// build the HTTP request
char szBuffer[1024];
sprintf(szBuffer, "GET %s\n", Filename);
nRet = send(Socket, szBuffer, strlen(szBuffer), 0);
if (nRet == SOCKET_ERROR)
{
J_event("error D");
closesocket(Socket);
return;
}
// receive the file contents and print to local 'index.html'
FILE *f=fopen("index.html","wb");
while(1)
{
nRet = recv(Socket, szBuffer, sizeof(szBuffer), 0);
if (nRet == SOCKET_ERROR)
{
J_event("error E");
break;
}
else if (nRet == 0) // server closes connection ? (or just there'arent bytes to read)
break;
fwrite(szBuffer, nRet, 1, f); // write to file
}
closesocket(Socket);
fclose(f);

...where the 'Servername' is either an IP or domain name etc. www.google.com, and 'Filename' a specific HTML file with its directory etc. /files/index.html As you see im using the most easiest GET syntax: "GET %s\r\n", no HTTP/1.1 or Host: or anythine else... This syntax works for www.google.com/index.html, but not with: http://www.nba.com/games/20071030/scoreboard.html // request timed out http://jolt-3d.sf.net/index.htm // error 400, bad URI ??? My questions are: 1) is those errors have to do with bad parameters after GET ? 2) why when im using "GET %s HTTP/1.1\r\n" the system halts? (or any version) 3) my internet connection is not ADSL, but GPRS / 3G. Maybe winsock is confused somewhat with this ? P.S. I found something odd with explorer & my code. I tried to download a file (that didnt exists) from my site (in sourceforge) with both ways: the explorer returned with the known sourceforge error page which shows the error code, the server & url, all filled normally: My winsock code HADNT filled the server name, and the url was somewhat formatted with %1/%%3 etc, more specically was: /home/groups/%1%%2/htdocs/ff.htm, instead of the correct /home/groups/j/jo/jolt-3d/htdocs/ff.htm What is going on? If anyone has some time pls check this code, i think it will help anyone that wants to download "freely" an HTML page without grab his hands into commercial products. thanx

Share this post

Share on other sites
Advertisement
You can try the HTTP-GET library, that does exactly that.

Share this post

Share on other sites
nice library,
although it doesnt handle re-direction &
other things, is far better than my code :)
thanx

Share this post

Share on other sites
Yes, it's somewhat minimal :-)

However, it should be not too hard to put re-direct parsing, cookies, and whatever else you need on top of what's there. The networking and request/response part works fairly well.

Share this post

Share on other sites
...i suppose that is a library of yours (i saw the ~hplus directory :) )
really nice work ! Just one more question: is there a way to bypass
the header-like text before the actual html page ?
Im using a number of html pages from my c programs in real-time & doing parsing byte-2-byte, so i know (and need) the same byte-offsets for several of these pages; but with the header things (and offsets :) ) are changing...
I must start thinking where the <HTML> starts or is there an easier way ?
my thanx

Share this post

Share on other sites
Quote:
 Original post by vlzvl...i suppose that is a library of yours (i saw the ~hplus directory :) )really nice work ! Just one more question: is there a way to bypassthe header-like text before the actual html page ?Im using a number of html pages from my c programs in real-time & doing parsing byte-2-byte, so i know (and need) the same byte-offsets for several of these pages; but with the header things (and offsets :) ) are changing...I must start thinking where the starts or is there an easier way ?my thanx

HTTP headers are fixed. You can send minimal subset, but it needs to conform to specification.

HTTP supports partial GET requests. They need to be supported by server. Some do not support it, and some deliberately disable it.

Share this post

Share on other sites
The headers end after the character sequence "\r\n\r\n" (CR, LF, CR, LF). That character sequence cannot be part of the header. Thus, you can just look for that sequence, and when you find it, you know that the data starts with the very next byte. That may or may not be "<HTML>" by the way -- it could be "<html>," or "<?xml>," or "<!DOCTYPE>," or "," or some extra blanks inserted by whomever generated the page.

Share this post

Share on other sites
thats the info i wanted :) thanx to both of you

Share this post

Share on other sites

• Advertisement
• Advertisement

• Popular Contributors

1. 1
2. 2
Rutin
19
3. 3
4. 4
khawk
14
5. 5
A4L
13
• Advertisement

• 13
• 26
• 10
• 11
• 44
• Forum Statistics

• Total Topics
633743
• Total Posts
3013643
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!