Sign in to follow this  
michael879

http files

Recommended Posts

ok Im sure this is simple I just cant find anything on it. When I do a google search for http it comes up with nothing because every single page has http in it. There is a website Im trying to access from a program that has a database full of stuff. Its accessable by some url : https://www.blah.com/?gameCode=P4&number=1101171271 where number can vary. I am trying to collect a group of these numbers and save them to a file. However, ofstream doesnt work. Im guessing it doesnt support remote files. Does anyone know an easy way to do this?

Share this post


Link to post
Share on other sites
You probably want the w3's original RFC's (Request for Comments), they should describe the protocol fairly accurately :)

http://www.w3.org/Protocols/

Share this post


Link to post
Share on other sites
Quote:
Original post by michael879
I dont see anything relevant on that page


Then you didn't read it. It has the specifications of the HTTP protocol, which is what you'll have to use to talk to a web server.

If you don't want to be bothered with that, have a look at libcurl. The web server appears to be down at the moment though (or it's extremely slow). The Sourceforge page is here.

Share this post


Link to post
Share on other sites
To access files stored on a website you need to use TCP/IP (Winsock if you're using Windows) and HTTP. I would recommend you use the library since HTTPS means you need to be able to handle SSL (Secure Socket Layer) in your app. The Curl library appears to support it.

Good luck

Share this post


Link to post
Share on other sites
turns out the url works with http too. anyway, I got everything working perfectly. However, after a while it stopped working. Its as if the website blocked me or something. Instead of getting the html page I would get a short message saying its not availble. The weird part is that I can still access the page in firefox. Does anyone know what could cause this?

Share this post


Link to post
Share on other sites
If you hit the web server 100 times a second, it's very likely to ignore your requests. Heck, it could even get you in legal trouble.

My advice? Wait until you aren't banned anymore, and add a pause to your program between two hits. Don't try to fetch more than 1 page a second, and I'm guessing you should be fine.

You'll still look very suspicious in logs though, and I hope what you're doing is legal: companies typically don't like people leeching their copyrighted data.

Share this post


Link to post
Share on other sites
its public data, I know selling it is questionable (the site shut someone down who was selling it) but Im pretty sure just storing it privately is fine.. Also I wasnt hitting it near 100 times per second. It took about 2 seconds to get the page and process it. I think the banning might have something to do with this program that they shut down that did what Im trying to do. Im probably going to just give up but I am curious how they can ban me from accessing a page through firefox but not internet explorer? what is the difference? they both use port 80 right? same ip..

Share this post


Link to post
Share on other sites
Quote:
Original post by michael879
its public data, I know selling it is questionable (the site shut someone down who was selling it) but Im pretty sure just storing it privately is fine..


I just thought I'd warn you that some people might get touchy. I'm not the web police though, don't worry. [grin]


Quote:
Also I wasnt hitting it near 100 times per second. It took about 2 seconds to get the page and process it. I think the banning might have something to do with this program that they shut down that did what Im trying to do. Im probably going to just give up but I am curious how they can ban me from accessing a page through firefox but not internet explorer? what is the difference? they both use port 80 right? same ip..


I guess it's the different User-agent in the HTTP header. You can try using some Firefox extension to "disguise" it as MSIE and see if that works.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this