Sign in to follow this  

What Knowledge will I need to start this project?

This topic is 3599 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

EDIT: I didn't see the Genral Programming forum as i usally view the forums using active topics and the beginners forum is the only on I ever stop by So if some one with the ability to do so happens to read this could they move it over to there as i feel i may get a better response Thank you. I spend a bit of my time helping around a indexing site. I recently realized I was doing a lot of repetative work to keep the site up to date. A large amount of my time is spent finding new resources and adding them to the site. Now the program I'm hoping to develop to help me with this would need to be able to open all desired web pages and retrieve the desired information and store it for another ability of the program to follow later. I'll describe the function of the program now in bullit point points as I find it easer than using paragraphs of text. 1) the program will require the user to input the range of pages to be worked on e.g siteX/page=1 to siteX/page=XX 2) The program will ask the user what elements of the page they are looking to work with for example I want the program to list all <\a href> entries on the page. 3) The program will open a web page or use a web browser to retrieve the source code. 4) Then it will produce a list of all the desired elements. 5) Move to the next page and repeate adding any new entries to the list. 6) the program now returns control to the user asking what they wish to do with the data. these will be editing options where the user can choose to ignore or remove certain entries from the list. Other options will include how to manipulate the current naming convention into the one you will use later e.g <\a href=" " >linkA<\/a> linkA_Z will tagged as link1_26 for use later. 7) Now the list of data needs to be entered in to a websites input area I want the program to auto fill the desred boxes for example. Fill 3 text entry bokes the first with LINK the second with 1 and the 3rd with 26. 8) Then it would be nice it the program could hit the submit butto for me and cycle through all the links to be entered. I don't expect people to write the program for me I'm just a bit stuck on what I need to know. My original thoughts were to save all the source files to the local machine then work on the sources from there. Use the source from the site where I need to enter the data and modify the source to contain default values set by my program. then open up the sourrce code in my browser and hit the submit button. This has the down side of requiring me to obtain each source code for the sites the material is descovered on and then sublit each entry one at a time. Which is not much of an impovement over the current manual system. Thank you for any help Ramearess [Edited by - ramearess on February 8, 2008 8:33:17 AM]

Share this post


Link to post
Share on other sites
There used to be a Compaq's project called WebL which allowed for flexible HTML parsing. It doesn't appear to be available any longer.

As far as such tasks are concerned, you should find Perl to be quite suitable.

Share this post


Link to post
Share on other sites
You are really sure about the amount of work for such a project? Is it the effort worth?

If yes, you can parse your htmlsourecode through a proxy(written in perl, java whatever) and manupulate the links within so they point to a counting script or something else.

Or overlaying the datamined webpage with a simple html controlpanel.

It is the simplest solution i can thinking about. Alternative(and it is only a thought) rewrite a renderengine like gecko.

Share this post


Link to post
Share on other sites
Thanks for the replies although I have no idea as to what either of you are suggesting. Perhaps I sould of explained what knoledge I posses I have used C++ to right a few console apllications and am able to use openGL to render to the screen I have a minimal understanding of the win32 Api just enough to set up a window for opengl to render to.

Now I've spent some time looking into possible solutions I belive there may be two that are withing my grasp one would be to open a html/javascript document I right locally and give it all the desired functions. For example 3 frames one containing the user interface from my document and the other 2 being 2 pages opened by the user which are to be manipulated.

The second route I considered was to follow this tutorial which at the end of I would of picked up the sytax of a new language most likely to be C# and programmed a rss feed reader would there be much of a diffrence in writing a progam to access the internet and grab a web page instead of a rss feed.

Share this post


Link to post
Share on other sites
Quote:
Original post by ramearess

The second route I considered was to follow this tutorial which at the end of I would of picked up the sytax of a new language most likely to be C# and programmed a rss feed reader would there be much of a diffrence in writing a progam to access the internet and grab a web page instead of a rss feed.


Or... you could learn Perl and regular expressions. Even sed or similar tools might be enough for what you need.

RSS feed and HTML are unrelated.

Share this post


Link to post
Share on other sites
I'd use Python for this. It has decent libraries for reading web pages and parsing HTML/XML. Stages 1 to 5 should take no time at all. 6 is a bit more tricky. 7 and 8 shouldn't be difficult once you get the hang of making remote web requests.

Share this post


Link to post
Share on other sites
I finally settled on what I'll be using for this project which will be C# and .net the reason for this is how much code visual studio c# can write for you. It seems really simple to setup a window apllication and drag and drop all of the UI all thats needed code wise is how to handle each event.

I managed to complete the web page retrival part using the system.net libary. I then use the response from the url as an input stream which I'm still to go about manipulating but I belive I can do this (I can in c++ so why not in C#).

What I am currently having problems with is combing a SQL database into my project anyone recomend some C# data binding tutorials?

The final part of submiting data will probably use the system.net lib again but if any one knows of a document that will esplain how to submit data in the same format as a html form on a page.

Share this post


Link to post
Share on other sites

This topic is 3599 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this