reading through a web page c/c++

Started by
9 comments, last by GameDev.net 17 years, 6 months ago
Just wondering how to open a webpage as a file.I didnt have this sort of trouble in PHP,but c++ is a whole different story as i cant seem to be able to go beyond the files on my computer.Point me to some docs or libs out there i can read through if possible.Thanks ahead.
Advertisement
Quote:Original post by ilavos
Just wondering how to open a webpage as a file.I didnt have this sort of trouble in PHP,but c++ is a whole different story as i cant seem to be able to go beyond the files on my computer.Point me to some docs or libs out there i can read through if possible.Thanks ahead.


1) download the webpage from its URL using either native sockets or wrappers such as libCURL, respectively the MSIE stuff

2) either save the contents to a temporay file or work directly with the buffer

How would i go about doing the second option?
I don't think the AP was giving you two options. I think s/he was suggesting you do 1), then 2). You can't open a file with standard C or C++ file operations with a web address - the file needs to be in a location that your computer sees as a drive, such as your local hard drive or a mapped network drive.

I think the AP is suggesting that you need to download the webpage onto such a drive and then deal with it as a normal file.
That depends on how you solve the first part, i.e. what mechanism you use to retrieve the file.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I've done some website crawling code before. Basically you can do one of two things:

1) Using standard winsock, you can basically create a simple HTTP client that connects to an ip address and recieves the incomming data, and then do with that data what you like.
2) What I did was use the microsoft IE OCX (activeX) control. I created an instance, used the "navigate" function to load a webpage. Done.

The second option gives you an advantage in that you can then use the IE control (I use it non visually but there are some quirks to that so typically you can just have it 0 pixels wide/tall) to traverse the data with objects. IE will also allow you to pump in raw &#106avascript, which is nice for form filling (of course you can also use the actuial html interface in the control to manually set the values).<br><br>I used the TWebBrowser component in Delphi (which is a wrapper to the IE ocx), but I'm very sure the same can be done in pure C++.
"Mommy, where do microprocessors come from?"
There are more efficient options available in C++, such as InternetReadFile, libCURL, and so on. In fact, using manual sockets and a trivial HTTP request implementation is probably easier, simply because accessing ActiveX/COM stuff from C++ is a bit of a nightmare.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quote:Original post by EasilyConfused
I don't think the AP was giving you two options. I think s/he was suggesting you do 1), then 2). You can't open a file with standard C or C++ file operations with a web address - the file needs to be in a location that your computer sees as a drive, such as your local hard drive or a mapped network drive.

I think the AP is suggesting that you need to download the webpage onto such a drive and then deal with it as a normal file.


the AP in fact suggested to use whatever method the OP is familiar with to download the HTML page from the web, using for example plain sockets, libCURL, MSIE or even ACE and then either saving the contents to a temporary file, or working DIRECTLY with the data in the buffer if that's feasible. You don't necessarily have to save downloaded data to a file in order to be able to process it further.



So i went with ApochPiQ's suggestion of using the wininet.h functions.Now get this,there's a syntax error inside this header file!Is this a known problem or what?I'm using msvc++ 6.0.
Where did you get your copy of the headers from? What is the error you're getting when you compile? Are you sure you didn't do something like include foo.h before wininet.h, and leave off a semicolon at the end of a class definition inside foo.h?


You should note that VC6 is extremely out of date, and using it to write new software is morally tantamount to cutting off your own legs. And arms. And maybe poking out an eye or two.

I would very highly recommend you switch to VC2005 (the Express Edition is free) if your project allows.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

This topic is closed to new replies.

Advertisement