Jump to content
Site Stability Read more... ×
  • Advertisement
Sign in to follow this  
  • entries
  • comments
  • views

One of those days

Sign in to follow this  


It's been one of those days. I've spent the last hour trying to track down a bug that didn't exist. I wrote the code *perfectly* the first time around, but I screwed up the test case in such a subtle manner that it seemed like the class invariants got borked.

In any case, the current item being folded into the library is a specialized string class, the sortable_string. Basically it's a string class that exists for one reason only: to live happily in a std::set or std::map. And by live happily I mean that the class is optimized for operator<() when used in a std::set/map with a large number of elements.

Basically, when optimizing access in a binary search tree, the most significant factor is memory access. (Which, incidently, is often the most significant factor in many optimization problems.) So, one step in the right direction is to minimize the size of the string class. Under MSVC 7.1 this doesn't take much, as the std::string implementation it ships with weighs in at 28 bytes. The smallest you can make a string class on current systems would be 4 bytes, the size of a pointer. However, this isn't too effective either, because then every comparison operation hits two cache lines for the string, one for the main body of the string, and one for the buffer that the string proper lives in. Actually four lines because there's two strings involved in every comparison.

So the implementation of sortable_string uses eight bytes. One part is a key, another part is a pointer to the string buffer. The key is basically the first four bytes of the string arranged in an integer. When doing comparison between two sortable_strings the keys are compared, and only if the keys are different then the strings in the buffers are compared. This means that in the common case only body of the string needs to be accessed, which reduces total memory access.

This technique can be generalized to other data types, but in general, I find that usually switching to a hash based container is more effective. It's also more effective with strings, but it's unfortunately common that I need to get a range of strings (like iterators that define a range for all strings from "apple" to "bird") hence why I'm putting the sortable_string class in my code library.
Sign in to follow this  


Recommended Comments

I'm surprised it's even 28 bytes large on MSVC 7.1. Does that include a local buffer for short strings?

Share this comment

Link to comment
Yes, it's 16 bytes for the small string optimization local buffer (unioned with the pointer to an external string buffer if it's past the local buffer size), 4 bytes for the string size, 4 bytes for the reserved data size and 4 bytes to store the allocator by value as a member variable of a base class. From the code structure it seems like the last part is used for exception safety reasons.

Share this comment

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!