HTTP Client
Maybe I am over thinking this but it seems a valid question to me. Ok I have got a working socket it can connect to a website and get the response now comes the tricky part. How do you know when the response ends?
The server im connecting to is nothing more than a simple apache server. It does not always have a content-length in the header and it doesn't seem to automatically close the connection which I was under the impression that it should but i even came across other websites that are on different servers that still don't close the connection.
I was looking through examples and they seem to stop trying to receive after the first time they get nothing. That to me doesn't seem to be a good solution I know connections are better and better but what if someone has a poor wifi connection and it takes a few seconds to get a response or satellite or something of that nature. Do they just have to reload the page and if that's the case how does my program know that it didn't come all the way through?
Oh and I am using c++ and windows sockets (WinSock2.h) Sorry
Winsock documentation goes into detail about return codes from calls, such as when the end of stream was reached or something else happened.
HTTP documentation specifies how connections behave.
The trivial case means simply reading from socket until it returns 0 or an error.
In practice, properly handling this is a surprisingly tedious task. I know that Firefox still hangs from time to time if something weird happens on network.
HTTP documentation specifies how connections behave.
The trivial case means simply reading from socket until it returns 0 or an error.
In practice, properly handling this is a surprisingly tedious task. I know that Firefox still hangs from time to time if something weird happens on network.
It seems kind of weird that theres not a specific end of the http stream that to me seems like a must for the kind of content its serving.
Where can I find more information about the "end of stream" I did some quick google searches with very little result maybe its just cause i dont know the exact wording.
Where can I find more information about the "end of stream" I did some quick google searches with very little result maybe its just cause i dont know the exact wording.
In HTTP/1.0, the default action is Connection: close - that is, the server disconnects after all content is sent. In that sense, the Content-Length header is not required, unless you explcitly say Connection: keep-alive.
In HTTP/1.1, it's the other way around. The default is Connection: keep-alive (meaning Content-Length is a required header) but the server can explicitly turn that off with Connection: close (then Content-Length is no longer required).
I don't know if there are any servers that respond with nether Connection nor Content-Length and HTTP/1.1, but if there are then they are non-compliant with the standard. Unfortunately, non-compliant implementations are just part of life in the world of the web...
If I were designing a client today, I would simply have a relatively short (say 5 seconds) timeout once you've started receiving content. If you don't get a Connection or Content-Length header, and you don't receive anything for > 5 seconds, drop the connection (though in reality I'd probably skip that and just use libcurl :p).
In HTTP/1.1, it's the other way around. The default is Connection: keep-alive (meaning Content-Length is a required header) but the server can explicitly turn that off with Connection: close (then Content-Length is no longer required).
I don't know if there are any servers that respond with nether Connection nor Content-Length and HTTP/1.1, but if there are then they are non-compliant with the standard. Unfortunately, non-compliant implementations are just part of life in the world of the web...
If I were designing a client today, I would simply have a relatively short (say 5 seconds) timeout once you've started receiving content. If you don't get a Connection or Content-Length header, and you don't receive anything for > 5 seconds, drop the connection (though in reality I'd probably skip that and just use libcurl :p).
That worked for the most part I just changed it from http 1.1 to 1.0
Im also going to add some timeouts just in case. You say you would use curl how do they handle it I know they are pretty well documented do they themselves mention this issue somewhere?
Thank you
Im also going to add some timeouts just in case. You say you would use curl how do they handle it I know they are pretty well documented do they themselves mention this issue somewhere?
Thank you
There's also for example Chunked transfer encoding, which might be why you're not receiving a Content-Length header.
I think that the problem was caused by it sending chuncked encoding. I saw on the wikipedia link that you can atleast see when it ends with the 0 thats good news so there are several ways I would have to add.
Thank you all
Thank you all
Quote:I would simply have a relatively short (say 5 seconds) timeout once you've started receiving content.
Writing "robust" code that attempts to "handle" non-conforming cases is one of the worst scourges of software engineering today. Such code is problematic for several reasons:
1) You'll have a hard time testing it, unless you possess an example of a non-conforming server with that exact behavior.
2) The implementation, if it "works," will work 95% of the time, but in various cases, there may be a lag spike on the network larger than 5 seconds; in that case, your implementation will terminate the transfer prematurely without knowing there's a problem.
3) Someone else may use your implementation as reference, and thus perpetuate a really bad design decision.
You should implement the spec, as strictly as you can. If, after implementing the spec, and verifying that you are in conformance, you find some case where it doesn't work, then you should analyze that case in detail, and implement a work-around that is tailored to that particular problem case. Make sure you identify the problem case as narrowly as possible, so that it doesn't cause a general loosening of your implementation.
Recommending "lenient" implementations and general hacks not targeted to a specific problem is always a bad idea, and is always bad software engineering, in my experience (which after 25 years in the business is fairly extensive).
Quote:Original post by jeff8j
I think that the problem was caused by it sending chuncked encoding. I saw on the wikipedia link that you can atleast see when it ends with the 0 thats good news so there are several ways I would have to add.
Since you seem to have looked at libcurl documentation and wiki, why not go straight to the authoritative source?
HTTP specification, part 1, chapter 7 at minimum.
Winsock reference, at least the recv(), send() and any other function you might end up using.
That really is all, and from the horse's mouth, as the saying goes.
Even though the above two are well written, they are somewhat extensive and rather dry read, which is why effectively nobody (perhaps WebKit does) implements them in entirety, or at least in fully conforming way.
Well I was able to get the http client so it wasn't being held up on sites. Now i ran into another problem binary files. When it trys to get a binary file all the null bytes kill the string. I wrote this all with string functions thinking that was better since it was the c++ way to go instead of char *.
So is there a way to hold the data in a string or do I have to rewrite everything using char* or what options do i have?
So is there a way to hold the data in a string or do I have to rewrite everything using char* or what options do i have?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement