Also, what about people writing single-threaded code (or properly scheduled multi-threaded code) -- why should their usage of shared_ptr be 100x slower, just in case you need their useless "thread safety" guarantee? Well written multi-threaded code doesn't need shared variables, so enforcing atomic counters is forcing a bad design choice onto your users...
Fortunately, boost has options for removing thread safety overhead if you're writing single threaded code. Defining BOOST_DISABLE_THREADS will disable thread support for all of boost (commonly done through boost/config/user.hpp) and BOOST_SP_DISABLE_THREADS will disable it for shared_ptr.