• Advertisement
Sign in to follow this  

Is POD assignment atomic?

This topic is 3895 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi. Is the assignment of POD types atomic operation? If one thread writes 32bit integer and another one reads it, could it happen that reader thread gets garbage because writer thread managed to update only one or two bytes out of four, before thread switch happened? What's with 64bit types like doubles and long long integers on 32bit CPUs compared to 64bit CPUs? Is their assignment atomic on 64bit architectures?

Share this post


Link to post
Share on other sites
Advertisement
There are no threads in the C or C++ language. Because of this, the languages themselves provide no guarantee (or even concept) of atomicity for any of their semantics. So, no, POD assignment is not atomic. Not even one-byte assignment is atomic.

However, you can perform atomic assignment using:
  • Your hardware. Your assembly language quite possibly describes which operations are guaranteed to be atomic, and which aren't. For instance, ld.global on a GeForce 8800 GTX is atomic and works for up to 128 bits. Use these. On a specific compiler and hardware C or C++ code may compile to atomic operations, but this is not portable behaviour.
  • Your software. Your threading library certainly has primitives for atomic operations, along with synchronization and mutex primitives to ensure atomicity of sets of operations.

Share this post


Link to post
Share on other sites
ToohrVyk is right; however, in practice, assignment of primitive types is atomic on common cpu architectures and compilers assuming the right conditions are met. The tricky bit is of course those conditions. The standard one is that data has to be properly aligned.

There's another gotcha for playing fast & loose like this on multi-processor machines. If you don't play the game by the right rules you can write some memory but the other processor won't detect that you've done so and thus will continue to return the old value from it's cache back to the program. This may or may not be a problem depending on how your program is designed.

When in doubt use a syncronization primitive. On Windows critical sections are pretty light weight. For single values the Interlocked operations are very light weight.

Share this post


Link to post
Share on other sites
Quote:
ToohrVyk is right; however, in practice, assignment of primitive types is atomic on common cpu architectures and compilers assuming the right conditions are met. The tricky bit is of course those conditions. The standard one is that data has to be properly aligned.

There's another gotcha for playing fast & loose like this on multi-processor machines. If you don't play the game by the right rules you can write some memory but the other processor won't detect that you've done so and thus will continue to return the old value from it's cache back to the program. This may or may not be a problem depending on how your program is designed.


Assumption is the mother of all screw-ups.

After all, it's 1970, nobody will be using our air traffic control software in 30 years...

Concurrent programming is annoying and frustrating to debug. When everything goes right. Unless you have a guarantee in either language (Java has such guarantees, yet still had certain flaws with concurrency) or some platform API call, that that particular part is re-entrant, or in this case, atomic, assume it's not, and treat it as such.

As always, synchronization isn't as expensive as it seems at first. If it is a major bottleneck, then that's usually design fault. But in concurrent applications, you rarely have the luxury of your application crashing or faulting. It'll simply emit an invalid value once every 20 hours, possibly giving you weeks of headaches tracking the issue down.

Just assume the worst from the start.

Also before foregoing various safety checks, make sure to understand the platform you're working on in detail. Various basic operations will indeed be atomic in practice, but can still cause problems on multi-core systems as mentioned above. It's same as C++'s undefined behaviour. Some cases will behave the same across several compilers and platforms. But there are no guarantees that they won't break horribly tommorow.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anon Mike
ToohrVyk is right; however, in practice, assignment of primitive types is atomic on common cpu architectures and compilers

No it isn't.
An assignment might consists of three operations (load into register, update register value, store to memory). The load part might not be necessary if you're *only* doing an assignment, but that still leaves two potential operations. The store part is, on common singlecore systems, atomic, yes, but changing the value and then storing it to memory (which is what you're usually doing in an assignment) may not be.

True, the CPU might be able to combine multiple operations into one instruction, and then it *might* again be atomic, but... you don't know.

Share this post


Link to post
Share on other sites
Quote:
Original post by Spoonbender
Quote:
Original post by Anon Mike
ToohrVyk is right; however, in practice, assignment of primitive types is atomic on common cpu architectures and compilers

No it isn't.
An assignment might consists of three operations (load into register, update register value, store to memory). The load part might not be necessary if you're *only* doing an assignment, but that still leaves two potential operations. The store part is, on common singlecore systems, atomic, yes, but changing the value and then storing it to memory (which is what you're usually doing in an assignment) may not be.

True, the CPU might be able to combine multiple operations into one instruction, and then it *might* again be atomic, but... you don't know.
Worse than that, no significant processor takes memory barriers on loads and stores unless explicitly requested. That means that the multiprocessor caching issue that Anon Mike mentioned comes into play. So any code that assumes that assigning processor words will be thread safe is completely broken.

Share this post


Link to post
Share on other sites
A small expansion to the question:
Say I have a bool that indicates if a worker thread should stop. The worker thread periodically checks the bool, which is initialized as false in the beginning. Then, at one time, the main thread assigns true to that boolean.

Now that wouldn't have to use locks (critical sections), right?

Share this post


Link to post
Share on other sites
Quote:
Original post by DaBono
Now that wouldn't have to use locks (critical sections), right?


In C and C++, the worker thread might read the boolean halfway through the write, resulting in undefined behaviour (because the value is undefined). This kind of approach is unreliable.

Other pratical problems (even when you're lucky to have hardware with atomic assignment of booleans, and a compiler which uses this assignment model) include caching policies preventing the modification from reaching the worker thread for a few seconds or even minutes.

Lock-free concurrent programming requires atomicity guarantees, typically in the form of an atomic CAS operation. If your concurrent code uses neither locks nor an atomic CAS, the probability is extremely high that it will break in a subtle yet annoying manner.

Share this post


Link to post
Share on other sites
I've really enjoyed the discussion so far. However, I found this nice passage in the Platform SDK docs (under "Interlocked Variable Access"):

Quote:

Simple reads and writes to properly-aligned 32-bit variables are atomic. In other words, when one thread is updating a 32-bit variable, you will not end up with only one portion of the variable updated; all 32 bits are updated in an atomic fashion. However, access is not guaranteed to be synchronized. If two threads are reading and writing from the same variable, you cannot determine if one thread will perform its read operation before the other performs its write operation.


I'm no stranger to synchronization methods and implementation, but low-level CPU behavior is not my expertise. However, this paragraph from the Platform SDK docs seems to contradict some of what I've been reading here, and maybe someone who knows more can enlighten us. In the case of a simple bool to flag a thread termination, the above paragraph seems to indicate that you can never get an "undefined" value. I believe the bool data type is implemented as a 32 bit quantity in MSVC, and assuming it was properly aligned, changing it from true to false or vice versa should be atomic without any special care taken.

As stated above, simple reads and writes, although atomic, do not guarantee sequence. A critical section wouldn't guarantee sequence either, just that only one thread could read or write the value at a time. Given the above paragraph, it seems that on 32-bit Intel chips, a critical section around a simple bool for flagging thread termination is redundant and unnecessary.

Anyway, just curious what you experts think about the above quote and how it relates to the current discussion.

Share this post


Link to post
Share on other sites
Quote:
Original post by strtok
However, this paragraph from the Platform SDK docs seems to contradict some of what I've been reading here, and maybe someone who knows more can enlighten us. In the case of a simple bool to flag a thread termination, the above paragraph seems to indicate that you can never get an "undefined" value.


Never is a little bit extreme. As I've said above, the C and C++ standards themselves provide no atomicity guarantees, so you must rely on external guarantees provided by external software (compiler, library) or hardware (processor). The Platform SDK provides one such guarantee (on x86 Windows using Microsoft Visual C++) but it does not apply to other hardware, software and compilers combinations unless explicitly stated. For instance, g++ on x86 Windows might not make that guarantee and perform some optimizations which would break it, or it might.

The bottom line is, as always, to seek external guarantees, because there are no internal ones, and then:
  • Thoroughly document and code-assert the existence of these guarantees.
  • Provide an alternate code path to be used when those guarantees are not used.

Share this post


Link to post
Share on other sites
Assume the worst and hope for the best. That's what it comes down to.
Best not try and skimp on thread-saftey.

Share this post


Link to post
Share on other sites
Quote:
Original post by strtok
Anyway, just curious what you experts think about the above quote and how it relates to the current discussion.


I think the quote was written before the days when multicore CPUs were the norm.

Sure, in a simple multithreaded environment on a single CPU such operations are atomic but with undefined sequencing. On a modern machine, each CPU has its own cache and if a thread running on one CPU writes to memory it doesn't mean another thread with its own cache will ever read that value from memory. Not without appropriate cache invalidation. That's where all thwm memory barriers and interlocking instructions and stuff come in to play.

If you're writing software now, by the time it gets released into the wild you'll only be able to find single-CPU systems in museums and garage sales. Future-proof your software.

Share this post


Link to post
Share on other sites
bool is typically a byte (definitely on MSVC), and I'm not aware of any CPUs which allow a read halfway through a write to a byte - it's generally impossible since the write happens to all bits simulataneously. On most if not all desktop CPUs the same goes for correctly aligned words too. So you can't get an undefined value unless you do unaligned access (where supported).

The real issue is synchronisation as the quote from MSDN indicates. If you write a byte on one CPU it may not flush from the cache for a significant amount of time. That's typically not what you want when trying to sync up threads.

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVykThe Platform SDK provides one such guarantee (on x86 Windows using Microsoft Visual C++) but it does not apply to other hardware, software and compilers combinations unless explicitly stated. For instance, g++ on x86 Windows might not make that guarantee and perform some optimizations which would break it, or it might.


The platform SDK doesn't assume MSVC. Also the property that write operations are atomic is guaranteed by the x86 processor (as long as it's a 486 or newer), so it works with any compiler and OS combination.

A partial answer to the 64-bit question of the OP - pentium and newer x86 processors guarantee atomic writes to aligned 64-bit memory locations.

Share this post


Link to post
Share on other sites
Quote:
Original post by Jerax
The platform SDK doesn't assume MSVC. Also the property that write operations are atomic is guaranteed by the x86 processor (as long as it's a 486 or newer), so it works with any compiler and OS combination.

How do you know that the optimizer won't alter code such that it isn't simply a write instruction? As long as there are no guarantees in the actual standard we can't be sure that it generates the assembly we expect it to. These kind of bugs will be rare, but also very hard to spot and fix.

Share this post


Link to post
Share on other sites
Quote:
Original post by Jerax
The platform SDK doesn't assume MSVC. Also the property that write operations are atomic is guaranteed by the x86 processor (as long as it's a 486 or newer), so it works with any compiler and OS combination.


I can write a perfectly Standard-abiding C++ compiler which performs non-atomic write operations on an x86 processor (if only by doing two partial atomic writes).

The Platform SDK only discusses the interaction between MSVC and the x86 processor. It has no control or knowledge over what machine code other C++ compilers generate.

Share this post


Link to post
Share on other sites
I agree that a guarantee should always be used, and that a variable should always be kept from being accessed simultaneously by two different processors for example, however this cache some people have talked about seems strange.

Firstly I think that in a loop while(var == 0) where var is initialized to 0, none of this matters, since the only time var == 0 will be false is if var is changed by some other code, and if this will only ever happen when the loop is supposed to exit none of this should matter unless there is the cache-problem mentioned.
I still agree this shouldn't be counted on, and a critical section should be used.

However, if there is a problem with cache I don't see how this can be solved with critical sections?
That a critical section is entered doesn't necessarily have to mean that cache is flushed, does it?
If a non-local variable is changed this should never be kept in the cache, if it is things will break no matter how much you protect this with critical sections, won't it?
If one thread is constantly doing:

while(!exit) {
EnterCriticalSection();
if(globalVar != 0) exit = true;
LeaveCriticalSection();
}

and another does

EnterCriticalSection();
globalVar = 1;
LeaveCriticalSection();

this will always work, because the variable will never be accessed by both at the same time, however only because the globalVar is 'volatile' or whatever it's called. The critical section can't change the 'volatility' can it?
So any cache miss in

while(!exit) {
if(globalVar != 0) exit = true;
}

with another thread doing

globalVar = 1;

should also be present in the example with critical sections?

Cache can't possibly work that way for volatile variables, they must be guaranteed to be written and read non-cached.

/Erik

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Quote:
Original post by Jerax
The platform SDK doesn't assume MSVC. Also the property that write operations are atomic is guaranteed by the x86 processor (as long as it's a 486 or newer), so it works with any compiler and OS combination.


I can write a perfectly Standard-abiding C++ compiler which performs non-atomic write operations on an x86 processor (if only by doing two partial atomic writes).

The Platform SDK only discusses the interaction between MSVC and the x86 processor. It has no control or knowledge over what machine code other C++ compilers generate.


You could, but why would you? Besides with caching your two writes would likely be combined into a single atomic write in any event.

Share this post


Link to post
Share on other sites
Quote:
However, if there is a problem with cache I don't see how this can be solved with critical sections?
That a critical section is entered doesn't necessarily have to mean that cache is flushed, does it?

It can't, and it doesn't. That's why you need to use volatile and a memory barrier.

Share this post


Link to post
Share on other sites
Quote:
Original post by Jerax
bool is typically a byte (definitely on MSVC), and I'm not aware of any CPUs which allow a read halfway through a write to a byte - it's generally impossible since the write happens to all bits simulataneously. On most if not all desktop CPUs the same goes for correctly aligned words too. So you can't get an undefined value unless you do unaligned access (where supported).


Just the other month, I was debugging an OSS project where one could set -- but not unset -- configuration options, under OS X.

Turns out, a semi-recent change of settings had caused a 4-byte bool. Some serialization code had made the same assumption you had -- that bool was 1 byte -- and this caused an error that was hidden on my little-endian windows box (since only the least significant byte was tweaked, I suppose), but apparent on my big-endian, PPC, OS X box.

Endian issues. With a freakin' bool.



It's... the little things like that... which make you give up hope entirely on anything C++ related being sane, and avoid potentially undefined behavior like the plague.

Share this post


Link to post
Share on other sites
Quote:
Original post by MaulingMonkey

Just the other month, I was debugging an OSS project where one could set -- but not unset -- configuration options, under OS X.

Turns out, a semi-recent change of settings had caused a 4-byte bool. Some serialization code had made the same assumption you had -- that bool was 1 byte -- and this caused an error that was hidden on my little-endian windows box (since only the least significant byte was tweaked, I suppose), but apparent on my big-endian, PPC, OS X box.

Endian issues. With a freakin' bool.



It's... the little things like that... which make you give up hope entirely on anything C++ related being sane, and avoid potentially undefined behavior like the plague.


Serialization in C++ isn't easy. And that's the easy part.

In general, for any kind of portable projects that use serialization, the types used for writing should be completely project defined, sized enforced through compile-time checks, and reading done at safest possible level.

Serialization also exposes some other problems. Reading binary data into variables will cause alignment problems (since serialized members will not fit the memory alignment), causing direct conversions memory-to-type conversions to fail.

But like I said before about assumptions: never assume anything about C++ datatypes. Not even signedness (signed char vs. char), let alone the more exotic types, such as int.

This is also an argument to code at highest possible level, avoiding direct access whenever possible.

And also, the tiny little hacks that "optimize" the code are mostly redundant these days. And if you do *need* them, make sure to understand what all could possibly go wrong. But any kind of memory offset juggling or similar should be banned by default.

C++ isn't *that* bad. Even with templates, standard libraries and various newer features. I do however find that in order to really get comfortable in C++, writing portable code at -pedantic and /W4, possibly with static checkers helps immensely at dealing with language-specific problems. It's also a great lesson for understanding the tiny details that go under the hood.

But for someone unprepared, or coming from comfortable languages, the difference is like going from doing 200 on a highway to tip-toeing over a mine-field. IMHO, anyone seriously considering software development should get proficient with C++. If you survive that, you're good to go in just about any language, since I can't think of any issue that would apear in higher level languages, that one wouldn't need to deal extensively in C++.

As for productivity... Depends. If the features of C++ do offer that edge (and for some game development, or other high-performance applications, they really do), then that's the way to go. But for 90% of things today, there is no need anymore.

Share this post


Link to post
Share on other sites
Quote:
Original post by strtok
I've really enjoyed the discussion so far. However, I found this nice passage in the Platform SDK docs (under "Interlocked Variable Access"):

Quote:

Simple reads and writes to properly-aligned 32-bit variables are atomic. In other words, when one thread is updating a 32-bit variable, you will not end up with only one portion of the variable updated; all 32 bits are updated in an atomic fashion. However, access is not guaranteed to be synchronized. If two threads are reading and writing from the same variable, you cannot determine if one thread will perform its read operation before the other performs its write operation.


I'm no stranger to synchronization methods and implementation, but low-level CPU behavior is not my expertise. However, this paragraph from the Platform SDK docs seems to contradict some of what I've been reading here


The confusion comes because some people are talking about what is *guaranteed* while others (including the quote above) are talking about what happens in practice, given certain common conditions.

e.g. if you have something like "int x" and then later on you do:

x = 0x12345678;

what commonly happens is that the whole 0x12345678 is written out in one shot. However, it is perfectly valid for a C/C++ compiler to decide it wants to write out the 0x1234 part first and then later on write out the 0x5678 part. Since C/C++ doesn't acknowledge the concept of threads everything is fine as far as it is concerned, but now your app could catch things in the middle and get the 0x1234 plus whatever garbage happened to be in the low part.

There isn't really a contradiction - everybody is saying to use sychronization objects to do your updating unless you're very very sure you know what you're doing. Some people are just saying to use them even if you're sure you know what you're doing because this stuff is hard even for experts and even if you get everything 100% right today changing hardware/software conditions could make your assumptions invalid in the future.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement