• Advertisement
Sign in to follow this  

Good Idea or Bad Idea?

This topic is 4600 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So I had this idea to make a "really fast" strupr function. The idea: use MMX (8x parallel) or SSE2 (16x parallel) vector math to operate on 8 or 16 characters at a time, using branch-free conditional logic (of course this would only work on strings where the length is known up front). Here's the question: would this be significantly faster than the usual char-by-char method, given the extra instructions necessary for branch-free conditional logic, and the costs of making unaligned memory accesses (just how likely do you think it'll be to have your strings aligned on 16-byte boundaries?)?

Share this post


Link to post
Share on other sites
Advertisement
Is turning a string to uppercase normally something you need to do very rapidly?

Not to be overly pragmatic, but even though it might be a neat, quick implementation, it would have some drawbacks, and the speed you'd be gaining wouldn't be worth much.

Share this post


Link to post
Share on other sites
Case-insensitive hash algorithms were what I had in mind. The hashing itself doesn't lend itself to parallelism (usually), but maybe the conversion can.

Share this post


Link to post
Share on other sites
Does it work with non-ASCII character sets and accented characters like á -> Á and ω -> Ω ?

Share this post


Link to post
Share on other sites
In the "real world", text data gets internationalized, and has to use Unicode, and maybe even characters from beyond the Basic Multilingual Plane - doing things with "branch-free logic" then at least means encoding in UCS-4 (UTF-16 won't cut it because of surrogate pairs) and a lookup table of 2^21 entries (the whole Unicode code space - actually, for the time being I think you can get away with less - only like 100k values have been assigned, but then I don't think it's the lowest 100k either). After all, you need to handle lots of weird cases.

But actually, really proper uppercasing could change the length of the text string, so doing this properly wouldn't really be possible at all. For example, the 'beta' symbol in German is properly uppercased as "SS", two characters. (And woe to the author who uses a character with the same "beta" glyph that's actually intended for writing Greek!)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement