Multithreading is often misunderstood, even under devs. Multithreading is primary used for parallelism and not to speed things up. For example, in games multithreading is ideal to keep your game responsive while the game is loading some resources (for the next area), the user is doing some inputs, or the AI is calculating (re)actions.
Yes you can achieve speed ups with mt, and mt is often used for speed ups, for example the rendering in suites like 3ds or Maya. But your problem must be suited to be run in a parallel way. And in most cases the speed up is far away from a linear speed up. With a perfect linear speed up you will gain potentially 300% performance with a quad-core, this seems huge. But a linear speed up is unrealistic. You have to organize (Mutex, MVar, synchronize, STM) the different processes or threads on their meeting-points, and that results into a slow down. It's utopian that a whole game problem will gain a 300% speed up, even +100% is far away from reality. In most cases you will solve specific sub-problems with mt or, and that is the most common way, you decoupling sub-systems from each other to be run parallel on their own processing unit.
I couldn't disagree with this more. This may be true for typical GUI-tools, but not games. Games are (soft-) realtime applications meaning you've got to hit a fixed time budget per frame, consistently.
When you're making a GUI-tool, you need the GUI part to remain at an "interactive" level of responsiveness (not real-time), while you do some heavy processing over a long period of time in the background. Threads are a very convenient way to achieve this -- if you put the GUI in one, and the heavy processing in another, then the OS will ensure that each of them obtains some amount of CPU time every so often (by default on Windows: one 15ms time slice at least once every 5 seconds).
Using this same approach in a real-time application is harmful. For example, say that we're on a single-core CPU, and when we load a file into RAM we've then got to run a LZMA decompression step on the loaded data, which takes a total of 1 second. You don't want this to affect the progress of the game's 'main thread' and impact the frame-rate.
Approach 1) We put the decompression code into a separate background thread, which sleeps unless it has work to do. When it does have work to do, we're relying on the OS's thread scheduler to choose which thread is running on the single CPU core. By default on windows, the scheduler granularity is 15ms, so the decompression thread will require 67 time-slices to complete it's 1 second task. If our main thread is attempting to run at fixed real-time frame-rate of 60Hz (a limit pf 16.6ms per frame), then during the time that the decompression thread is awake, this is now impossible (unless your 'main thread' only has 1.6ms of work to do per frame). From time to time (unpredictable), the main thread will be put to sleep for an entire 15ms time-slice (or maybe multiple time-slices).
That kind of unpredictability is simply not acceptable to a real-time application.
Approach 2) We manually time-slice the decompression code, so that after it's run for ~1ms (or some other chosen threshold), it stores it's state and returns/yields -- a.k.a. cooperative multi-tasking. We run the decompression code on the "main thread" every frame, knowing that the biggest interruption that this task can have is a very predictable 1ms per frame.
As swiftcoder mentioned above, many "scripting" languages only provide these kinds of "cooperative multi-tasking threads" (often called Fibers in C++), instead of OS-level threads, and their entire purpose is to allow for concurrency of tasks.
On the other hand, OS-level threads should only be used in order to take advantage of hardware-level threads, which is only useful for gaining extra computational power. Using OS-threads for anything other than gaining access to extra hardware, in a real-time application, is an abuse of them. The exception to this is when interacting with legacy APIs that have long-blocking functions, which force you to put them into a thread.
n.b. file loading and user input aren't in this category -- your OS provides (non-blocking) asynchronous methods for these.
Post-load resource processing, and AI processing can both be time-sliced, but may also be multi-threaded if they're processor intensive.
MT is often a trade-off. MT will make your project much more complex. More complexity will make your project more error-prone and will slow down the whole project progress. Your code-base is more fragile and "uglified". Whats the benefit? More responsiveness, that's fine!. 10%-30% "speed up", maybe not worth it.
That entirely depends on the MT strategy that you choose. Many job-based strategies end up producing code that's simpler than typical C++ OOP code...