Sign in to follow this  
Kenneth Godwin

String Searching Algorithms and practical performance...

Recommended Posts

While other algorithms that I know of (e.g. Apostolico Giancarlo, Boyer Moore) they don't beat Horspool for purposes of searching large blocks of text for small (five or less character) patterns. Does anyone know a algorithm that is not theoretically faster, but actually is when dealing with small patterns? The following is a couple 'test results' for the same set of data. 48297 words, 433017 characters, two sets of tags per word. The IndexOf Method took 451.3765785 seconds to process. The BoyerMoore Method took 252.5482641 seconds to process. The Horspool Method took 115.2101059 seconds to process. The TurboBoyerMoore Method took 241.7624532 seconds to process. The ApostolicoGiancarloMatch Method took 241.9990094 seconds to process. 6000 words, 53792 characters, two sets of tags per word. The IndexOf Method took 5.5662522 seconds to process. The BoyerMoore Method took 2.1252076 seconds to process. The Horspool Method took 1.004881 seconds to process. The TurboBoyerMoore Method took 2.1488536 seconds to process. The ApostolicoGiancarloMatch Method took 2.1401635 seconds to process.

Share this post


Link to post
Share on other sites
What is the usage pattern/characteristics of your search needs? Will the large text be changing frequently or relatively static: Would preprocessing the large text be benificial or would it have to be done every search anyways?

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
What is the usage pattern/characteristics of your search needs? Will the large text be changing frequently or relatively static: Would preprocessing the large text be benificial or would it have to be done every search anyways?


I'm just looking for the fastest algorithm available for dealing with small patterns (5 characters or less).

Preprocessing won't help since the end result of the search is saved atm. (e.g. If an identical search occurs and the source text is the same, it just re-displays the same result rather than attempting another search). The source text is also modified frequently so unless the preprocessing would save the time over the course of 2-3 searches, it wouldn't help.

At this point, I'm pretty sure Horspool is the only one that is practical for what I'm doing. I just figured I'd ask GameDev in case someone would go 'Oh this is a better idea'.





Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this