[java] Search program

ahmedkl    100
I have been assigned a project to make a program that searches through documents , basically it will be like a mini search engine. What i have to do is first parse an html page (removing all tags and stuff) and save it in a txt file [DONE] now how will i search through them , for example if 2 txt file contains some same information then how will i make the program to display that data from both the files? which data structure should i use ? and what can be done to make the searching fast.

Fenrisulvur    186
Hmm, ever heard of a KWIK index?

Actually, you probably don't even need to bother with that. There are only a few hundred thousand words in the English vocabulary, apparently, of which your index will probably only see a small subset - you can easily binary search this.
So, create an alphabetical index of unique words which appear in the files, and associate each with the line+column number of each occurrence of the word. You'll need some sort of secondary storage for this.

You could also probably use some sort of hash table for this index instead - anything which can be searched with better time complexity than a linear search of the files themselves, really.

You'll then need to devise semantics for your search system, and use that combined with your index to devise the system's logic.

