Jump to content
  • Advertisement
Sign in to follow this  
allingm

volatile and pods (and atomics)

This topic is 2513 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So, I was writing my own atomic<T> type and started having trouble with respect to classes. I looked up online and realized that this wasn't something I actually wanted. However, http://www.stdthread.co.uk/doc/headers/atomic/atomic.html says that the STL atomic type can work on pods and built in types. I continued with pod support, but still had trouble. If I do the following I still get trouble:

struct POD
{
double x;
double y;
};

volatile struct POD a;
struct POD b;

a = b; //C2678
b = a; //C2679

Should I drop POD support or am I doing something wrong.

By the way I'm using volatile to force a memory read.

template< typename T >
T ForceRead( volatile T * x )
{
T result = *x;
return result;
}

template< typename T >
inline T AtomicLock< T >::Load( void ) const //an atomic type that uses a lock
{
T result;

mMutex.Lock();
result = ForceRead( &mData );
mMutex.Unlock();

return result;
}

Share this post


Link to post
Share on other sites
Advertisement
By the way my a b example seems to work with all buit in types except __m128. Also, I was able to get the following to work. Seems a bit of a hack though...

template< typename T >
T ForceRead( volatile T * x )
{
T result;

char * out = reinterpret_cast< char * >( &result );
volatile const char * in = reinterpret_cast< volatile const char * >( x );

for( unsigned int i = 0; i < sizeof( T ); ++i )
{
out[ i ] = in[ i ];
}

return result;
}

Any thoughts?

Share this post


Link to post
Share on other sites
So the general item here is that what you are writing is not "atomics" in any form. This is "synchronized" which is a whole different ball of wax. Synchronized access is comparable to atomics and you can do things that way without problems, but it is generally considerably slower and will fail if you ever actually use real atomics on any of the structure.

I highly suggest you read up a bit more on what atomic really means, it is very different than what you are showing here. I'd probably start by reading the documentation for Intel's TBB library to get a real understanding of atomic.

Share this post


Link to post
Share on other sites
volatile means different things to different compilers. Even on those compilers where it does imply barriers, such as recent versions of Visual C++, they are inserted in a conservative fashion (I believe MS decided to implement what people had been incorrectly assuming about volatile in order to fix their code). Instead, insert the appropriate barriers to prevent the compiler and processor from reordering loads and stores. This is minimally sufficient. You will need to understand the memory model of your CPU in order to do this.

If you don't need POD types or anything beyond integers and pointers, don't bother with them. An atomic integer and an atomic pointer are likely all you'll need.

Share this post


Link to post
Share on other sites
@AllEightUp

I'm fully aware that the "atomic" type I've presented here isn't truly atomic. It is only meant to act like an atomic for POD types. I use InterlockedExchangeCompare for all my truly atomic calls. You also say "will fail if you ever actually use real atomics on any of the structure". I'm not sure what you mean by that, could you clarify? If you are talking about locks within locks that should not be a problem, but I have a feeling you're talking about something else.

@edd^2

Should the Mutex calls not provide the barriers you are asking for? I looked at the compiled code and it looked acceptable. I believe you are referring to the reordering of instructions that use volatile to a place before the mutex is ever locked. Is that what you mean? I did notice that Boost puts a readwritebarrier before it makes a read on its "atomic read". However, I may take your advice not to support POD types, but at the moment I feel like I'm very close.

Share this post


Link to post
Share on other sites
If the mutex calls provide the barriers (and they do, if used correctly), why are you using volatile at all?

I suggest you not bother with atomic<POD> because it will likely turn out to be useless. I suspect std::atomic<POD> exists for POD types that are the same size or smaller than those types for which the CPU can support "atomic" operations natively. You could of course simulate that, but given it's something you haven't realised thus far, I'm guessing you won't be needing it.

For larger PODs or classes in general, I don't think an atomic<> wrapper is appropriate. Instead, take each class or system of classes that need to be made "thread safe", examine them on a case-by-case basis, determine their invariants, and add the appropriate protection.

Share this post


Link to post
Share on other sites

@AllEightUp

I'm fully aware that the "atomic" type I've presented here isn't truly atomic. It is only meant to act like an atomic for POD types. I use InterlockedExchangeCompare for all my truly atomic calls. You also say "will fail if you ever actually use real atomics on any of the structure". I'm not sure what you mean by that, could you clarify? If you are talking about locks within locks that should not be a problem, but I have a feeling you're talking about something else.

@edd^2

Take your example and consider how you might want to use it. Say for instance many functions just want to read the x or y member as a real atomic fetch using a true atomic. I.e.


uint64_t Atomic::Load( uint64_t* value )
{
uint64_t result;
(void)value;
// 64 bit atomic load for ie32 without using advanced instruction sets.
__asm
{
mov eax, value;
fild qword ptr 0[ eax ];
fistp qword ptr result;
}
return result;
}


So, you can read the x or y items all you want and use other atomic operations such as store/xchg/cmpxchg etc to modify the values and never have any issues. But as soon as you use the synchronized variation to read or write the total structure, or even parts of it, all bets are off. The reason is that given your fetch working at the byte level it could get 2 bytes, an atomic store changes the first 8 bytes and then the fetch gets the next six bytes, so it has 2 old bytes and 6 new bytes and the double is nice and corrupted.

This is why I suggest being explicit in what you are doing. Rename the mutex version to "SynchronizedFetch" or whatever so you can remember to never access the item with both "Atomic" and "Synchronized" because they simply won't work together nicely.

As to volatile, barriers and atomic operations, you'll notice I don't use the volatile keyword. Instead I use the proper barriers and control things that way. I.e. for Win32:

#include <intrin.h>
#pragma intrinsic(_ReadWriteBarrier)
#pragma intrinsic( _ReadBarrier )
#pragma intrinsic( _WriteBarrier )

Makes the various barrier's available.

For gcc you can do the lazy solution:

#define READWRITEBARRIER __asm__ __volatile__( "\n" : : : "memory" )

Anyway, using the barriers correctly can speed up your code considerably since volatile tends to turn off a lot of optimization which don't need to be avoided. Worst case, just call READWRITEBARRIER before doing any atomic operation and it is still usually faster than using volatile.

Share this post


Link to post
Share on other sites

The reason is that given your fetch working at the byte level it could get 2 bytes, an atomic store changes the first 8 bytes and then the fetch gets the next six bytes, so it has 2 old bytes and 6 new bytes and the double is nice and corrupted.


No modern CPU works at the byte level. They can't read or write anything smaller than a cache line, which means reading from and writing to a primitive type like double doesn't need any special synchronization, as long as it's not misaligned. You just need to make sure the compiler doesn't optimize away the memory reads by using registers, which is what volatile does.

Of course read-modify-write operations aren't thread safe on volatiles, and do need extra synchronization to make sure no other thread gets in between the read and the write. That can be done with the various atomic functions like InterlockedIncrement().

Share this post


Link to post
Share on other sites

[quote name='AllEightUp' timestamp='1327545517' post='4906304']
The reason is that given your fetch working at the byte level it could get 2 bytes, an atomic store changes the first 8 bytes and then the fetch gets the next six bytes, so it has 2 old bytes and 6 new bytes and the double is nice and corrupted.


No modern CPU works at the byte level. They can't read or write anything smaller than a cache line, which means reading from and writing to a primitive type like double doesn't need any special synchronization, as long as it's not misaligned. You just need to make sure the compiler doesn't optimize away the memory reads by using registers, which is what volatile does.

Of course read-modify-write operations aren't thread safe on volatiles, and do need extra synchronization to make sure no other thread gets in between the read and the write. That can be done with the various atomic functions like InterlockedIncrement().
[/quote]

Correct but not for the posted code, it will work the way I mention. I.e. the loop will pick up the first 2 bytes, some other thread atomic writes to the data, the cpu updates the cache and the next 6 bytes come from the updated data and not the original data. So, you have corrupted data from the read, and 2 bytes is just arbitrary for example, the atomic operation will corrupt the read at anywhere between 1 and 15 bytes of the loop if they schedule simultaneously. As to the read modify write, that is the purpose of a compare and exchange atomic, but over arbitrary structures (>128bits on modern hardware) you can't make it atomic so synchronized behavior of the entire structure is generally appropriate. And, before it comes up, that loop will likely end up being optimized for the appropriate sized item in release but under normal circumstances it just means that you will get 4 bytes of corruption instead of 1-15, it will be "less likely" but less likely in computers just means it may take 1/100th of a second instead of 1/100000th before you get bad data.

Share this post


Link to post
Share on other sites
Well, after listening to you guys and talking with my teammate, we decided against adding POD support even though it seemed to work.

As an interesting note from the link in my first post you can see the standard will support PODs and will most likely use mutexs to achieve them as there is a is_lock_free function.

Also, I understand your argument AllEightUp, but with the locks that should never happen.

Finally @AllEightUp,
You talk about using the read write barriers instead of volatile, but the sole reason I was using volatile was to grab the data from RAM. I thought read write barriers just prevented compiler reordering which I wasn't worried about assuming the mutexes had their own read write barriers. Anyway, that code is gone, but I think I may have problems elsewhere. Currently for built in types I use:

template< typename T >
inline T ForceMemoryRead( volatile T * mem )
{
//FIX IT: trying to force a memory read from RAM, but I'm not sure what
// I'm looking for in assembler (I thought it was a c but.... not sure)
const T result = *mem;
_ReadWriteBarrier();

return result;
}

Does anybody know what might be wrong? As you can see from the comment I'm not seeing what I've been reading I should see.

Atomic< float > temp;
temp.Store( 1.0f );
009119CE fld1
009119D0 push ebx
009119D1 push esi
009119D2 fstp dword ptr [esp+28h]
009119D6 mov eax,dword ptr [esp+28h]
009119DA push edi
009119DB push eax
009119DC lea ecx,[esp+30h]
009119E0 push ecx
009119E1 call dword ptr [__imp__InterlockedExchange@8 (916008h)]

float result = temp.Load();
009119E7 mov eax,dword ptr [esp+2Ch]

std::cout << result << std::endl;
009119EB mov edx,dword ptr [__imp_std::endl (9160B8h)]
009119F1 mov dword ptr [esp+28h],eax
009119F5 fld dword ptr [esp+28h]
009119F9 push edx
009119FA push ecx
009119FB mov ecx,dword ptr [__imp_std::cout (916090h)]
00911A01 fstp dword ptr [esp]
00911A04 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (9160ACh)]
00911A0A mov ecx,eax
00911A0C call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (9160B0h)]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!