Jump to content
  • Advertisement
Sign in to follow this  
v0dKA

Is it possible to obtain the html file of a web page in C++?

This topic is 5005 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I really don't want to get into internet programming (at least yet), but I would like to obtain the HTML code of a web page at an address and store it into local variables so my program can work with it. Is there an easy way out here? I was hoping for a simple mechanism (like ifstream for files on the computer, except I need it for the internet). Any insight on forementioned problem appreciated [smile].

Share this post


Link to post
Share on other sites
Advertisement
I always like to use sockets to do it. It's a lot simpler than you'd think:
  • Open a connection to the server (e.g. gamedev.net)
  • Send this:

    "GET "+strPage+" HTTP/1.1\r\n"
    "Host: "+strHost+"\r\n"
    "User-Agent: DruinkIM Avatar Download\r\n"
    "Connection: close\r\n"
    "\r\n";


    Where strHost is the host name (E.g. "www.gamedev.net"), and strPage is the page to fetch, prefixed with a forward-slash (E.g. "/community/forums/topic.asp?topic_id=310133").

    Then you just read until the server closes the connection, or until you've read Content-Length bytes (I know the ogre3d server won't close the connection for 30 seconds, you have to do it yourself after readign enough bytes).
    The data that you get sent is first the HTTP header, and then "\r\n\r\n", and then the actual page data.
    Here's an example HTTP header:
    Quote:
    HTTP/1.1 200 OK
    Date: Wed, 30 Mar 2005 05:16:53 GMT
    Server: Apache/2.0.50 (Win32) PHP/5.0.3
    X-Powered-By: PHP/5.0.3
    Connection: close
    Content-Length: 12345
    Content-Type: text/html; charset=ISO-8859-1

    Share this post


    Link to post
    Share on other sites
    I doodled this up while trying to learn some of the iostreams and other STL goodies. It turned out to be rather limited in that it only works with text. It's not really complete, but does everything you want. It should compile for windows or unix, and should provide ideas for you if you want to start from scratch, or if you want to expand it!

    Included is a web example.

    testnetwork.h

    #include <iostream>
    #include <sstream>
    #include <string>
    #include <list>

    #ifdef _WIN32

    #include <winsock.h>
    #define socklen_t int

    #else

    #include <fcntl.h>
    #include <sys/socket.h>
    #include <netdb.h>
    #include <sys/types.h>
    #include <netinet/in.h>
    #include <unistd.h>
    #include <sys/time.h>
    #include <arpa/inet.h>

    #endif

    #define TESTNETWORK_DEFAULT_PORT 9411

    class networking;
    using namespace std;


    class connection{
    protected:
    int fd;
    int ready;
    string addr;
    string rbuf;
    string sbuf;
    virtual int read(networking&)=0;
    virtual int send(networking&)=0;
    connection (int,string&);
    public:
    string getline();
    void read(string&);
    void undoread(string&);
    void send(string&);
    void send(char *);
    friend ostream& operator<<(ostream &o, const connection &c);
    friend class networking;
    virtual int mtu(){return(1536);}
    };

    class tcp_connection:
    public connection{
    protected:
    int read(networking&);
    int send(networking&);
    tcp_connection(int infd, string& inaddr):connection(infd,inaddr){}
    friend class tcp_listen_connection;
    friend class networking;
    };

    class tcp_listen_connection:
    public connection{
    protected:
    int read(networking&);
    int send(networking&){}
    void accept(networking&);
    tcp_listen_connection(int, string&, string &);
    friend class networking;
    };



    class networking{
    private:
    // TODO: ban list.
    list<connection *> connections;
    fd_set master;
    void add(connection *);
    void remove(connection *);
    struct timeval *tv;
    int maxfd;
    void (*on_accept)(tcp_connection *);
    void (*on_disconnect)(tcp_connection *);

    friend void tcp_listen_connection::accept(networking&);
    public:
    networking ();
    connection *fdtocon (int);
    connection *addrtocon (int);
    connection *connect (string);
    void *disconnect (connection *);
    friend ostream& operator<<(ostream &o, const networking &n);
    void read();
    void send();
    int listen(int,string&,string&);
    int listen(int);
    bool is_banned(string){return(0);}
    struct timeval timeout();
    void timeout(struct timeval &);
    void set_accept(void (*in)(tcp_connection *)){on_accept=in;}
    void set_disconnect(void (*in)(tcp_connection *)){on_disconnect=in;}

    };






    testnetwork.cc

    #include "testnetwork.h"


    int networking::listen(int queue,string &listen_addr,string &port)
    //
    // cut/paste, cut/paste. Listener setup.
    //
    {

    unsigned long addr;
    int sockfd;
    int yes=1;
    struct sockaddr_in my_addr;
    tcp_listen_connection *listener;

    sockfd = socket(AF_INET, SOCK_STREAM, 0);

    my_addr.sin_family = AF_INET; /* host byte order */
    my_addr.sin_port = htons(atoi(port.c_str())); /* short, network byte order */
    //my_addr.sin_addr.s_addr = INADDR_ANY;
    my_addr.sin_addr.s_addr = inet_addr(listen_addr.c_str());
    bzero(&(my_addr.sin_zero), 8); /* zero the rest of the struct */

    setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, (char *)&yes, sizeof(int)) ;
    /* don't forget your error checking for bind(): */
    if (-1==bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr))){
    // TODO: throw an error, or pass to object logger.
    // either way, return.
    return(0);
    }
    if (-1==::listen (sockfd,queue)){
    // TODO: throw and quit.
    return(0);
    }
    // TODO: pass a logger to object
    //printf ("Server started at %s, listening at %i\n",inet_ntoa(my_addr.sin_addr),sockfd);
    // TODO-DONE?: create listen object.
    // Pass sockfd, addr.
    listener=new tcp_listen_connection(sockfd,port,listen_addr);
    add(listener);
    return(sockfd);
    }


    int networking::listen(int queue){
    //
    //
    //
    string a,b;
    stringstream ss;
    a="0.0.0.0";
    ss << TESTNETWORK_DEFAULT_PORT;
    b=ss.str();
    return(listen(queue,a,b));
    }


    connection *networking::connect(string inaddr){
    //
    // Connect to inaddr.
    //


    // Assume TCP for now.

    // TODO: split this to elsewhere.
    string::size_type x;
    string caddr;
    string cport;

    x=inaddr.rfind(":");
    if (x!=inaddr.npos && x!=inaddr.length()-1 && x!=inaddr.length()-1 && x!=0){
    cport.assign(inaddr,x+1,inaddr.length());
    caddr.assign(inaddr,0,x);
    }else{
    x=inaddr.rfind(" ");
    if (x!=inaddr.npos && x!=inaddr.length()-1 && x!=inaddr.length()-1 && x!=0){
    cport.assign(inaddr,x+1,inaddr.length());
    caddr.assign(inaddr,0,x);
    }else{
    caddr=inaddr;
    cport=TESTNETWORK_DEFAULT_PORT;
    }
    }

    //cout << "debug connect:\nAddr: " << caddr << "\nport: " <<cport <<"\n";


    int sockfd;
    int rtn;
    struct sockaddr_in addy;
    struct hostent *hostvar;
    connection *c;

    sockfd=socket(AF_INET,SOCK_STREAM,0);
    // TODO: dns lookups.
    addy.sin_family = AF_INET; /* host byte order */
    addy.sin_port = htons(atoi(cport.c_str())); /* short, network byte order */
    addy.sin_addr.s_addr = inet_addr(caddr.c_str());
    bzero(&(addy.sin_zero), 8); /* zero the rest of the struct */
    rtn=::connect(sockfd, (struct sockaddr *)&addy,sizeof(struct sockaddr));
    if (rtn==-1){
    // Error!
    // Throw a connect fail
    return(0);
    }
    c=new tcp_connection(sockfd,inaddr);
    add(c);
    return(c);




    }



    void networking_nothing(tcp_connection *in){}



    networking::networking(){
    //
    // Networking constructor.
    //
    int rtn;

    // Set tv to defaults.
    // default to non-blocking.
    tv=new timeval();
    tv->tv_sec=0;
    tv->tv_usec=0;

    // Set maxfd
    maxfd=0;

    #ifdef _WIN32
    WSADATA wsaData;
    rtn=WSAStartup(MAKEWORD(1,1),&wsaData);
    if (rtn!=0){
    // Post error!?!
    }
    #endif

    on_accept=networking_nothing;
    on_disconnect=networking_nothing;


    }

    void networking::add(connection *c){
    //
    // Add connection to list.
    //
    // TODO: retrieve dupes, and remove old?

    connections.push_back(c);
    FD_SET(c->fd,&master);
    if (c->fd>maxfd){maxfd=c->fd;}
    }

    void networking::read(){
    //
    // Run select, set ready, call reads.
    //
    fd_set tmp;
    int rtn;
    connection *c;
    list<connection *>::iterator cit;
    list<connection *>::iterator e=connections.end();

    if (connections.empty()){return;}
    tmp=master;
    select(maxfd+1,&tmp,0,0,tv);
    for (cit=connections.begin();cit!=e;++cit){
    c=*cit;
    // TODO: change this if c is a windows stdin.
    if (c->ready!=-1){
    if (FD_ISSET(c->fd,&tmp)){
    c->ready=1;
    }else{
    c->ready=0;
    }
    c->read(*this);
    }
    }
    }


    void networking::send(){
    //
    // Eerily similar to read...
    //
    fd_set tmp;
    int rtn;
    connection *c;
    list<connection *>::iterator cit;
    list<connection *>::iterator e=connections.end();

    if (connections.empty()){return;}
    tmp=master;
    select(maxfd+1,0,&tmp,0,tv);
    for (cit=connections.begin();cit!=e;++cit){
    c=*cit;
    // TODO: change this if c is a windows stdin.
    if (c->ready!=-1){
    if (FD_ISSET(c->fd,&tmp)){
    c->ready=1;
    }else{
    c->ready=0;
    }
    c->send(*this);
    }
    }
    }




    connection::connection(int sock,string &inaddr){
    //
    //
    //
    ready=0;
    fd=sock;
    addr=inaddr;
    //sbuf="moo";
    }


    string connection::getline(){
    //
    //
    //
    string::size_type x;
    string rtn;

    if (rbuf.empty()){return(rtn);}
    x=rbuf.find_first_of("\r\n");
    rtn=rbuf.substr(0,x);
    if (x!=rbuf.length()){
    x=rbuf.find_first_not_of("\r\n",x);
    if (x!=rbuf.npos){
    rbuf=rbuf.substr(x,rbuf.length());
    }else{
    rbuf.clear();
    }
    }else{
    rbuf.clear();
    }
    return(rtn);
    }


    void connection::send(string &in){
    //
    //
    //
    sbuf=sbuf+in;
    }


    void connection::send(char *in){
    //
    //
    //
    string tmp=in;
    send(tmp);
    }


    void connection::read(string &out){
    //
    // Read 1 line into string
    //
    out=connection::getline();
    }

    void connection::undoread(string &in){
    //
    // Undo a read();
    //
    if (in[in.length()]!='\n'){
    in=in+"\n";
    }
    rbuf=in+rbuf;
    }



    int tcp_connection::read(networking &n){
    //
    // Read data from socket if ready
    //
    string s;
    char *buf;
    int len=mtu();
    int rtn;
    if (ready!=1){
    return(0);
    }
    buf=(char *)malloc(len);
    bzero(buf,len);
    rtn=recv(fd,buf,len,0);
    if (rtn<1){
    // disconnection!
    // TODO: toss some sort of on-d/c
    // but for now, just unready.
    n.on_disconnect(this);
    ready=-1;
    free(buf);
    return(rtn);
    }
    // TODO: see if string has a pre-pend
    rbuf.append(buf);
    //cout << "in tcp::read(): " << fd << ":" << addr << ":" << rtn << ": " << buf << " -> " << rbuf << "\n";
    free(buf);
    return(rtn);
    }


    int tcp_connection::send(networking &n){
    //
    // Send data from socket if ready
    //
    int rtn;
    int len=mtu();
    int reallen;

    if (ready!=1){
    return(0);
    }
    reallen=sbuf.length();
    if (reallen>len){
    reallen=len;
    }
    rtn=::send(fd,sbuf.c_str(),reallen,0);
    if (rtn<0){
    // disconnection!
    // TODO: toss a d/c.
    n.on_disconnect(this);
    ready=-1;
    return(rtn);
    }
    if (rtn){
    sbuf.erase(0,rtn);
    }
    return(rtn);

    }


    tcp_listen_connection::tcp_listen_connection(int sock,string &port, string &inaddr):connection(sock,inaddr){
    //
    // Create listening socket connection object.
    //
    addr=inaddr +":"+port;
    }


    int tcp_listen_connection::read(networking &n){
    //
    // Check if ready. If so, send to accept!
    //
    if (ready==1){
    accept(n);
    ready=0;
    }
    }



    void tcp_listen_connection::accept(networking &n){
    //
    // Yay, new connection. Accept it, and add to networking.
    //
    tcp_connection *tcpc;
    int newfd;
    int sinsize;
    struct sockaddr_in insock;
    string tmpaddr;

    newfd=::accept(fd,(struct sockaddr *)&insock,(socklen_t *)&sinsize);
    if (!newfd){
    // Error.
    // Report it.
    return;
    }
    tmpaddr=inet_ntoa(insock.sin_addr);
    if (n.is_banned(tmpaddr)){
    // IP banned.
    // TODO: post message to client?
    #ifndef _WIN32
    close(newfd);
    #else
    closesocket(newfd);
    #endif
    return;
    }
    // New connection.
    // TODO: make a mechanism to post "new player" somewhere.
    tcpc=new tcp_connection(newfd,tmpaddr);
    n.add(tcpc);
    n.on_accept(tcpc);
    }




    httptest.cc

    #include "testnetwork.h"

    networking network;
    int main(){

    connection *c;
    string str;

    c=network.connect("216.185.96.234:80");
    c->send("GET / HTTP/1.0\r\n\r\n");
    while(1){
    network.read();
    network.send();
    str=c->getline();
    if (!str.empty()){
    cout << str << endl;
    }

    }
    }





    The above example should connect to gamedev, and print the root file.

    [edit: no now it really should. IIS is more strict about the newlines passed to it. -grumble-]

    Share this post


    Link to post
    Share on other sites
    Sign in to follow this  

    • Advertisement
    ×

    Important Information

    By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

    GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

    Sign me up!