Clicking a Link - What Happens ?

Started by
2 comments, last by MustEatYemen 18 years, 11 months ago
Hi guys, Let's say I visit a web page and click a link leading to another web site. What exactly happens ? Something like this: 1) my browser opens a socket connection to the remote host's port 80 2) my browser receives a response saying the connection is ok 3) my browser then sends a request for an HTML document 4) my broswer get the document and displays it Did I miss anything ? Some guys I know are making a hit tracker (for a Top 100 List) and are trying to make it cheat-proof. They don't know much about lower level socket stuff so I'm trying to help them out. I'm not a pro either but I can probably offer some ways as to how it can be cheated. First I need to know exactly what packets go in and out, then I'll have a better understanding of what's going on. Any help would be appreciated. Thanks.
Advertisement
I suggest installing a copy of Ethereal on your machine, and watching the traffic as you're surfing the web. It'll be much more illuminating than anything someone's likely to write in this forum.
enum Bool { True, False, FileNotFound };
You are pretty much correct there, although lacking in many detials. hplus0603 is correct that running ethereal would probably be the best option, just filter it so that you don't get a lot of extra garbage ("tcp port 80" should do fine as a capture filter).

Basically:

  1. Establish TCP connection to server

  2. Send get request
    GET /view.php3 HTTP/1.1Host: www.penny-arcade.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.7) Gecko/20050414 Firefox/1.0.3Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5Accept-Language: en-us,en;q=0.5Accept-Encoding: gzip,deflateAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7Keep-Alive: 300Connection: keep-aliveReferer: http://www.penny-arcade.com/Cookie: phpbb2mysql_data=a%3A0%3A%7B%7D; phpbb2chatforum_data=a%3A0%3A%7B%7DIf-Modified-Since: Wed, 11 May 2005 17:13:14 GMT


  3. The server will then send a response back, if the page was found it will typically be a 200, along with the page data. The browser will then make further requests for things such as images in the page. The data is typically (now days anyways) sent back in chunks (a feature of HTTP 1.1).
    HTTP/1.1 200 OKDate: Wed, 11 May 2005 17:14:29 GMTTransfer-Encoding: chunkedContent-type: text/htmlExpires: Mon, 26 Jul 1997 05:00:00 GMTLast-Modified: Wed, 11 May 2005 17:14:29 GMTCache-Control: no-cache, must-revalidate, max_age=0Pragma: no-cacheServer: MySQL sucks-ass/4.0 blowme-lvs/2.4... chunked data follows ...



That's about it. For a top 100 list, there really isn't any way to "prevent" cheating, but you can check the referer header amongst other things.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

If you have a bit more time you could look at the HTTP protocol in depth.
Recently wrote my own barebones http protocol client (no display), which was interesting, and not all that difficult.

Some links to find out more information about HTTP/1.1
http://en.wikipedia.org/wiki/Http
http://www.jmarshall.com/easy/http/
and of course, what would any technical discussion be w/o the RFC/FIPS
http://www.w3.org/Protocols/HTTP/1.1/rfc2616.pdf

As far as making a cheat proof hit counter, that involves looking closely at incoming data on the server side for what ip's are connecting. (Who requests x document, note them as having "voted", increment vote, etc)

HTTP is stateless, so user tracking, even logged in users is somewhat of a hack using cookies and other systems.
-Scott

This topic is closed to new replies.

Advertisement