volatile and pods (and atomics)

Started by
10 comments, last by All8Up 12 years, 2 months ago
So, I was writing my own atomic<T> type and started having trouble with respect to classes. I looked up online and realized that this wasn't something I actually wanted. However, http://www.stdthread.co.uk/doc/headers/atomic/atomic.html says that the STL atomic type can work on pods and built in types. I continued with pod support, but still had trouble. If I do the following I still get trouble:

struct POD
{
double x;
double y;
};

volatile struct POD a;
struct POD b;

a = b; //C2678
b = a; //C2679

Should I drop POD support or am I doing something wrong.

By the way I'm using volatile to force a memory read.

template< typename T >
T ForceRead( volatile T * x )
{
T result = *x;
return result;
}

template< typename T >
inline T AtomicLock< T >::Load( void ) const //an atomic type that uses a lock
{
T result;

mMutex.Lock();
result = ForceRead( &mData );
mMutex.Unlock();

return result;
}
Advertisement
By the way my a b example seems to work with all buit in types except __m128. Also, I was able to get the following to work. Seems a bit of a hack though...

template< typename T >
T ForceRead( volatile T * x )
{
T result;

char * out = reinterpret_cast< char * >( &result );
volatile const char * in = reinterpret_cast< volatile const char * >( x );

for( unsigned int i = 0; i < sizeof( T ); ++i )
{
out[ i ] = in[ i ];
}

return result;
}

Any thoughts?
So the general item here is that what you are writing is not "atomics" in any form. This is "synchronized" which is a whole different ball of wax. Synchronized access is comparable to atomics and you can do things that way without problems, but it is generally considerably slower and will fail if you ever actually use real atomics on any of the structure.

I highly suggest you read up a bit more on what atomic really means, it is very different than what you are showing here. I'd probably start by reading the documentation for Intel's TBB library to get a real understanding of atomic.
volatile means different things to different compilers. Even on those compilers where it does imply barriers, such as recent versions of Visual C++, they are inserted in a conservative fashion (I believe MS decided to implement what people had been incorrectly assuming about volatile in order to fix their code). Instead, insert the appropriate barriers to prevent the compiler and processor from reordering loads and stores. This is minimally sufficient. You will need to understand the memory model of your CPU in order to do this.

If you don't need POD types or anything beyond integers and pointers, don't bother with them. An atomic integer and an atomic pointer are likely all you'll need.
@AllEightUp

I'm fully aware that the "atomic" type I've presented here isn't truly atomic. It is only meant to act like an atomic for POD types. I use InterlockedExchangeCompare for all my truly atomic calls. You also say "will fail if you ever actually use real atomics on any of the structure". I'm not sure what you mean by that, could you clarify? If you are talking about locks within locks that should not be a problem, but I have a feeling you're talking about something else.

@edd^2

Should the Mutex calls not provide the barriers you are asking for? I looked at the compiled code and it looked acceptable. I believe you are referring to the reordering of instructions that use volatile to a place before the mutex is ever locked. Is that what you mean? I did notice that Boost puts a readwritebarrier before it makes a read on its "atomic read". However, I may take your advice not to support POD types, but at the moment I feel like I'm very close.
If the mutex calls provide the barriers (and they do, if used correctly), why are you using volatile at all?

I suggest you not bother with atomic<POD> because it will likely turn out to be useless. I suspect std::atomic<POD> exists for POD types that are the same size or smaller than those types for which the CPU can support "atomic" operations natively. You could of course simulate that, but given it's something you haven't realised thus far, I'm guessing you won't be needing it.

For larger PODs or classes in general, I don't think an atomic<> wrapper is appropriate. Instead, take each class or system of classes that need to be made "thread safe", examine them on a case-by-case basis, determine their invariants, and add the appropriate protection.

@AllEightUp

I'm fully aware that the "atomic" type I've presented here isn't truly atomic. It is only meant to act like an atomic for POD types. I use InterlockedExchangeCompare for all my truly atomic calls. You also say "will fail if you ever actually use real atomics on any of the structure". I'm not sure what you mean by that, could you clarify? If you are talking about locks within locks that should not be a problem, but I have a feeling you're talking about something else.

@edd^2

Take your example and consider how you might want to use it. Say for instance many functions just want to read the x or y member as a real atomic fetch using a true atomic. I.e.


uint64_t Atomic::Load( uint64_t* value )
{
uint64_t result;
(void)value;
// 64 bit atomic load for ie32 without using advanced instruction sets.
__asm
{
mov eax, value;
fild qword ptr 0[ eax ];
fistp qword ptr result;
}
return result;
}


So, you can read the x or y items all you want and use other atomic operations such as store/xchg/cmpxchg etc to modify the values and never have any issues. But as soon as you use the synchronized variation to read or write the total structure, or even parts of it, all bets are off. The reason is that given your fetch working at the byte level it could get 2 bytes, an atomic store changes the first 8 bytes and then the fetch gets the next six bytes, so it has 2 old bytes and 6 new bytes and the double is nice and corrupted.

This is why I suggest being explicit in what you are doing. Rename the mutex version to "SynchronizedFetch" or whatever so you can remember to never access the item with both "Atomic" and "Synchronized" because they simply won't work together nicely.

As to volatile, barriers and atomic operations, you'll notice I don't use the volatile keyword. Instead I use the proper barriers and control things that way. I.e. for Win32:

#include <intrin.h>
#pragma intrinsic(_ReadWriteBarrier)
#pragma intrinsic( _ReadBarrier )
#pragma intrinsic( _WriteBarrier )

Makes the various barrier's available.

For gcc you can do the lazy solution:

#define READWRITEBARRIER __asm__ __volatile__( "\n" : : : "memory" )

Anyway, using the barriers correctly can speed up your code considerably since volatile tends to turn off a lot of optimization which don't need to be avoided. Worst case, just call READWRITEBARRIER before doing any atomic operation and it is still usually faster than using volatile.

The reason is that given your fetch working at the byte level it could get 2 bytes, an atomic store changes the first 8 bytes and then the fetch gets the next six bytes, so it has 2 old bytes and 6 new bytes and the double is nice and corrupted.


No modern CPU works at the byte level. They can't read or write anything smaller than a cache line, which means reading from and writing to a primitive type like double doesn't need any special synchronization, as long as it's not misaligned. You just need to make sure the compiler doesn't optimize away the memory reads by using registers, which is what volatile does.

Of course read-modify-write operations aren't thread safe on volatiles, and do need extra synchronization to make sure no other thread gets in between the read and the write. That can be done with the various atomic functions like InterlockedIncrement().

[quote name='AllEightUp' timestamp='1327545517' post='4906304']
The reason is that given your fetch working at the byte level it could get 2 bytes, an atomic store changes the first 8 bytes and then the fetch gets the next six bytes, so it has 2 old bytes and 6 new bytes and the double is nice and corrupted.


No modern CPU works at the byte level. They can't read or write anything smaller than a cache line, which means reading from and writing to a primitive type like double doesn't need any special synchronization, as long as it's not misaligned. You just need to make sure the compiler doesn't optimize away the memory reads by using registers, which is what volatile does.

Of course read-modify-write operations aren't thread safe on volatiles, and do need extra synchronization to make sure no other thread gets in between the read and the write. That can be done with the various atomic functions like InterlockedIncrement().
[/quote]

Correct but not for the posted code, it will work the way I mention. I.e. the loop will pick up the first 2 bytes, some other thread atomic writes to the data, the cpu updates the cache and the next 6 bytes come from the updated data and not the original data. So, you have corrupted data from the read, and 2 bytes is just arbitrary for example, the atomic operation will corrupt the read at anywhere between 1 and 15 bytes of the loop if they schedule simultaneously. As to the read modify write, that is the purpose of a compare and exchange atomic, but over arbitrary structures (>128bits on modern hardware) you can't make it atomic so synchronized behavior of the entire structure is generally appropriate. And, before it comes up, that loop will likely end up being optimized for the appropriate sized item in release but under normal circumstances it just means that you will get 4 bytes of corruption instead of 1-15, it will be "less likely" but less likely in computers just means it may take 1/100th of a second instead of 1/100000th before you get bad data.
Well, after listening to you guys and talking with my teammate, we decided against adding POD support even though it seemed to work.

As an interesting note from the link in my first post you can see the standard will support PODs and will most likely use mutexs to achieve them as there is a is_lock_free function.

Also, I understand your argument AllEightUp, but with the locks that should never happen.

Finally @AllEightUp,
You talk about using the read write barriers instead of volatile, but the sole reason I was using volatile was to grab the data from RAM. I thought read write barriers just prevented compiler reordering which I wasn't worried about assuming the mutexes had their own read write barriers. Anyway, that code is gone, but I think I may have problems elsewhere. Currently for built in types I use:

template< typename T >
inline T ForceMemoryRead( volatile T * mem )
{
//FIX IT: trying to force a memory read from RAM, but I'm not sure what
// I'm looking for in assembler (I thought it was a c but.... not sure)
const T result = *mem;
_ReadWriteBarrier();

return result;
}

Does anybody know what might be wrong? As you can see from the comment I'm not seeing what I've been reading I should see.

Atomic< float > temp;
temp.Store( 1.0f );
009119CE fld1
009119D0 push ebx
009119D1 push esi
009119D2 fstp dword ptr [esp+28h]
009119D6 mov eax,dword ptr [esp+28h]
009119DA push edi
009119DB push eax
009119DC lea ecx,[esp+30h]
009119E0 push ecx
009119E1 call dword ptr [__imp__InterlockedExchange@8 (916008h)]

float result = temp.Load();
009119E7 mov eax,dword ptr [esp+2Ch]

std::cout << result << std::endl;
009119EB mov edx,dword ptr [__imp_std::endl (9160B8h)]
009119F1 mov dword ptr [esp+28h],eax
009119F5 fld dword ptr [esp+28h]
009119F9 push edx
009119FA push ecx
009119FB mov ecx,dword ptr [__imp_std::cout (916090h)]
00911A01 fstp dword ptr [esp]
00911A04 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (9160ACh)]
00911A0A mov ecx,eax
00911A0C call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (9160B0h)]

This topic is closed to new replies.

Advertisement