reading text from a webpage

Started by
4 comments, last by cozman 19 years, 1 month ago
Is it possible to read text from a webpage in c++ through the internet? If so, how would I go about doing this? I wanna have a program that takes text from the site and manipulates it. Mainly for fantasy sports cuz Im tired of adding everything together in my head. =] Thx, StickManX
Advertisement
It's very possible. Quite simple, in fact. All you need is to instantiate a socket (you will either have a POSIX sockets library or a platform-specific one, but virtually all of them expose a BSD socket API), connect to the URI in question and begin communication in the appropriate protocol.

Unfortunately, "speaking" network protocols is tedious. Fortunately, the Internet is littered with networking abstraction libraries. Two highly-rated, game-oriented options for C++ are HawkNL and RakNet.

Find the Windows Sockets 2 reference section of MSDN here.
I'd suggest a language other than C++, while it is possible, you'd have to deal with the appropriate parts of http yourself, whereas a language like Python or Java has the ability built in. IMHO it would be easier to learn a little Java/Python/something else and do this than learn all the required C++, and I believe the payoff would be better in learning a language like Python.

PS. Once you get the data from the site parsing the HTML would be easier in a language like Python as well.
Using a bit of socket programming you can connect to a server, and then issue http commands to retrive a webpage.It's not really difficult and you can learn the basics in a few days.

But ofcourse you will get the html syntax from the server as it will think you are a browser.....
______________________________________________________________________________________________________
[AirBash.com]
i agree, i would definetly use Python if i were you. you can write a program which downloads a page off the internet in Python in less then 5 lines of code.

if you insist on using C++, i recommend using HTTP GET. its a very small, simple library written by hplus (moderator of the networking forum) which lets you download files from a website. it doesnt get simpler then that in C++.
FTA, my 2D futuristic action MMORPG
a quick python example (mainly because I'd never tried)

import urllibfrom htmllib import HTMLParserfrom formatter import AbstractFormatterfrom formatter import NullWriterclass LinkFinder(HTMLParser):    """a simple class for finding links in webpages, inherits from HTMLParser"""        def start_a(self, attrs):        """ action to take when an anchor HTML tag is found"""                # loop through attributes        for attr in attrs:            # if the attribute is the href, show the location            if attr[0] == "href":                print attr[1]# here is the main part of the class# read in google's data (3 lines!)google = urllib.urlopen("http://google.com")data = google.read()google.close()# create an instance of the LinkFinder HTMLParser classlf = LinkFinder(AbstractFormatter(NullWriter()))lf.feed(data)   # feed the parser the data, it'll print the links

This topic is closed to new replies.

Advertisement