- Windows CriticalSections and Linux futexes are usually the best option you have. If contention is low, they will burn a minimal number of CPU cycles (if no other thread is in the CriticalSection/futex, it's the price of a function call, then it will do a busy loop, repeatedly checking whether the CriticalSection/futex can be entered and only then will the thread be put to sleep (which is by comparison extremely expensive since it has to wait until the thread scheduler to allocate it another time slice when the CriticalSection/futex becomes free again).
I would recommend using std::mutex if you can (Visual Studio 2012 RC, GCC 4.6+). It's portable and provides its own RAII scopes (std::lock_guard).
- There's also the possibility of writing lock-free data structures. These are mostly based on an operation called "compare and exchange" which all CPUs in the last 10 years have supported as an atomic operation (one that can't be preempted by another thread in the middle). There is a golden window of contention where lock-free data structures are much faster than CriticalSections/futexes - they're slightly slower at zero contention and tend to completely mess up under very high contention. They're also incredibly difficult to write even after years of experience with instruction reordering, cache lines and compiler behaviors. And they're a patent minefield.
- Thread-local storage is the equivalent of copying your data to each thread. You seem to have one variable, but each thread reading or writing it is in fact reading a variable of its own. Sometimes useful, often confusing.
- Smart pointers cannot help you with threading in any way. They cannot do any synchronization simply because when you call a method on the object the smart pointer is referencing, the smart pointer's function call operator will be invoked to return the address of the object. It could enter a CriticalSection/futex there, but there's no place where it could leave it again.
If a smart pointer is thread-safe, that means it won't blow up if you, for example, copy it while it's being destroying (a normal smart pointer, for example, grab the wrapped pointer, then increment the reference count - which might just have been decremented to zero between the two operations by another thread that is now destroying the object behind the wrapped pointer). Hint: Boost::shared_ptr and std::shared_ptr are not thread-safe. Boost's page on shared_ptr thread safety makes it sound a bit as if, but they're only saying that any number of threads can read the shared_ptr (dereference it) - which holds true for any C++ object - but a write (assigning, reference count adjustment) must never happen at the same time.
If you have the chance to use C++11 check out std::future to parallelize tasks in a simple way. Boost, Win32 and WinRT also offer thread pools (here's a small code snippet using the Win32 thread pool API: WindowsThreadPool.cpp) which are great of you can partition work into equal chunks (number of chunks = std::thread::hardware_concurrency ideally). Depending on the specific thread pool implementation, you can even do blocking tasks in those threads (the Windows thread pool will vastly overcommit the CPU based on some heuristics and it has a flag through which you can hint that you plan to block one of its threads for a longer time).