// address
IN_ADDR iaHost;
LPHOSTENT lpHostEntry;
iaHost.s_addr = inet_addr(Servername);
if (iaHost.s_addr == INADDR_NONE) // Wasn't an IP address string, assume it is a name
lpHostEntry = gethostbyname(Servername);
else // It was a valid IP address string
lpHostEntry = gethostbyaddr((const char *)&iaHost, sizeof(struct in_addr), AF_INET);
if (lpHostEntry == NULL)
{
J_event("error A");
return;
}
// socket
SOCKET Socket;
Socket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (Socket == INVALID_SOCKET)
{
J_event("error B");
return;
}
// port
LPSERVENT lpServEnt;
SOCKADDR_IN saServer;
lpServEnt = getservbyname("http", "tcp");
if (lpServEnt == NULL)
saServer.sin_port = htons(80);
else
saServer.sin_port = lpServEnt->s_port;
// fill rest
saServer.sin_family = AF_INET;
saServer.sin_addr = *((LPIN_ADDR)*lpHostEntry->h_addr_list);
// connect
int nRet = connect(Socket, (LPSOCKADDR)&saServer, sizeof(SOCKADDR_IN));
if (nRet == SOCKET_ERROR)
{
J_event("error C");
return;
}
// build the HTTP request
char szBuffer[1024];
sprintf(szBuffer, "GET %s\n", Filename);
nRet = send(Socket, szBuffer, strlen(szBuffer), 0);
if (nRet == SOCKET_ERROR)
{
J_event("error D");
closesocket(Socket);
return;
}
// receive the file contents and print to local 'index.html'
FILE *f=fopen("index.html","wb");
while(1)
{
nRet = recv(Socket, szBuffer, sizeof(szBuffer), 0);
if (nRet == SOCKET_ERROR)
{
J_event("error E");
break;
}
else if (nRet == 0) // server closes connection ? (or just there'arent bytes to read)
break;
fwrite(szBuffer, nRet, 1, f); // write to file
}
closesocket(Socket);
fclose(f);
...where the 'Servername' is either an IP or domain name etc. www.google.com,
and 'Filename' a specific HTML file with its directory etc. /files/index.html
As you see im using the most easiest GET syntax: "GET %s\r\n",
no HTTP/1.1 or Host: or anythine else...
This syntax works for www.google.com/index.html, but not with:
http://www.nba.com/games/20071030/scoreboard.html // request timed out
http://jolt-3d.sf.net/index.htm // error 400, bad URI ???
My questions are:
1) is those errors have to do with bad parameters after GET ?
2) why when im using "GET %s HTTP/1.1\r\n" the system halts? (or any version)
3) my internet connection is not ADSL, but GPRS / 3G. Maybe winsock is confused
somewhat with this ?
P.S. I found something odd with explorer & my code. I tried to download a file (that didnt exists) from my site (in sourceforge) with both ways: the explorer returned with the known sourceforge error page which shows the error code, the server & url, all filled normally: My winsock code HADNT filled the server name, and the url was somewhat formatted with %1/%%3 etc, more specically
was:
/home/groups/%1%%2/htdocs/ff.htm, instead of the correct
/home/groups/j/jo/jolt-3d/htdocs/ff.htm
What is going on?
If anyone has some time pls check this code, i think it will help anyone that
wants to download "freely" an HTML page without grab his hands into commercial products. thanx
download HTML via Winsock
hello,
im the author of Jolt3D! engine (jolt-3d.sf.net)
im trying to create an alternative way to 'InternetReadFile()' func of WININET.DLL, to read HTML pages from C programs.
My goal anyway is to get rid of the above dll & just use the wsock dll, but
i hit a problem...The problem is that with the following winsock code
im not getting correct results, or i dont know well the "GET" syntax.
The (easy) code is:
nice library,
although it doesnt handle re-direction &
other things, is far better than my code :)
thanx
although it doesnt handle re-direction &
other things, is far better than my code :)
thanx
Yes, it's somewhat minimal :-)
However, it should be not too hard to put re-direct parsing, cookies, and whatever else you need on top of what's there. The networking and request/response part works fairly well.
However, it should be not too hard to put re-direct parsing, cookies, and whatever else you need on top of what's there. The networking and request/response part works fairly well.
...i suppose that is a library of yours (i saw the ~hplus directory :) )
really nice work ! Just one more question: is there a way to bypass
the header-like text before the actual html page ?
Im using a number of html pages from my c programs in real-time & doing parsing byte-2-byte, so i know (and need) the same byte-offsets for several of these pages; but with the header things (and offsets :) ) are changing...
I must start thinking where the <HTML> starts or is there an easier way ?
my thanx
really nice work ! Just one more question: is there a way to bypass
the header-like text before the actual html page ?
Im using a number of html pages from my c programs in real-time & doing parsing byte-2-byte, so i know (and need) the same byte-offsets for several of these pages; but with the header things (and offsets :) ) are changing...
I must start thinking where the <HTML> starts or is there an easier way ?
my thanx
Quote:Original post by vlzvl
...i suppose that is a library of yours (i saw the ~hplus directory :) )
really nice work ! Just one more question: is there a way to bypass
the header-like text before the actual html page ?
Im using a number of html pages from my c programs in real-time & doing parsing byte-2-byte, so i know (and need) the same byte-offsets for several of these pages; but with the header things (and offsets :) ) are changing...
I must start thinking where the <HTML> starts or is there an easier way ?
my thanx
HTTP headers are fixed. You can send minimal subset, but it needs to conform to specification.
HTTP supports partial GET requests. They need to be supported by server. Some do not support it, and some deliberately disable it.
The headers end after the character sequence "\r\n\r\n" (CR, LF, CR, LF). That character sequence cannot be part of the header. Thus, you can just look for that sequence, and when you find it, you know that the data starts with the very next byte. That may or may not be "<HTML>" by the way -- it could be "<html>," or "<?xml>," or "<!DOCTYPE>," or "<!-->," or some extra blanks inserted by whomever generated the page.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement