Jump to content
  • Advertisement
Sign in to follow this  
StickManX

reading text from a webpage

This topic is 4921 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is it possible to read text from a webpage in c++ through the internet? If so, how would I go about doing this? I wanna have a program that takes text from the site and manipulates it. Mainly for fantasy sports cuz Im tired of adding everything together in my head. =] Thx, StickManX

Share this post


Link to post
Share on other sites
Advertisement
It's very possible. Quite simple, in fact. All you need is to instantiate a socket (you will either have a POSIX sockets library or a platform-specific one, but virtually all of them expose a BSD socket API), connect to the URI in question and begin communication in the appropriate protocol.

Unfortunately, "speaking" network protocols is tedious. Fortunately, the Internet is littered with networking abstraction libraries. Two highly-rated, game-oriented options for C++ are HawkNL and RakNet.

Find the Windows Sockets 2 reference section of MSDN here.

Share this post


Link to post
Share on other sites
I'd suggest a language other than C++, while it is possible, you'd have to deal with the appropriate parts of http yourself, whereas a language like Python or Java has the ability built in. IMHO it would be easier to learn a little Java/Python/something else and do this than learn all the required C++, and I believe the payoff would be better in learning a language like Python.

PS. Once you get the data from the site parsing the HTML would be easier in a language like Python as well.

Share this post


Link to post
Share on other sites
Using a bit of socket programming you can connect to a server, and then issue http commands to retrive a webpage.It's not really difficult and you can learn the basics in a few days.

But ofcourse you will get the html syntax from the server as it will think you are a browser.....

Share this post


Link to post
Share on other sites
i agree, i would definetly use Python if i were you. you can write a program which downloads a page off the internet in Python in less then 5 lines of code.

if you insist on using C++, i recommend using HTTP GET. its a very small, simple library written by hplus (moderator of the networking forum) which lets you download files from a website. it doesnt get simpler then that in C++.

Share this post


Link to post
Share on other sites
a quick python example (mainly because I'd never tried)


import urllib
from htmllib import HTMLParser
from formatter import AbstractFormatter
from formatter import NullWriter


class LinkFinder(HTMLParser):
"""a simple class for finding links in webpages, inherits from HTMLParser"""

def start_a(self, attrs):
""" action to take when an anchor HTML tag is found"""

# loop through attributes
for attr in attrs:
# if the attribute is the href, show the location
if attr[0] == "href":
print attr[1]

# here is the main part of the class

# read in google's data (3 lines!)
google = urllib.urlopen("http://google.com")
data = google.read()
google.close()

# create an instance of the LinkFinder HTMLParser class
lf = LinkFinder(AbstractFormatter(NullWriter()))
lf.feed(data) # feed the parser the data, it'll print the links



Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!