Back to General and Gameplay Programming

cache friendly lock free concept?

General and Gameplay Programming Programming

Started by slicer4ever October 31, 2013 03:57 PM

40 comments, last by King Mir 10 years, 5 months ago

slicer4ever

6,773

Author

October 31, 2013 03:57 PM

I have suspicions that the following concept is not very cache friendly, but i'm no expert on the subject, and like some pointers.

essentially, i have 2 threads, one for networking, and one for the game simulation. i want to process the packets in the game simulation thread, rather than in the network thread so that the simulation can be maintained as deterministic.

here's the concept i came up with for switching in/out packets between the two threads:


 
 
 
violatile int LockState = UNLOCKED;
linklist<Packet*> *InPackets = new linklist<Packet*>;
linklist<Packet*> *OutPackets = new linklist<Packet*>;
linklist<Packet*> *TempList = new linklist<Packet*>;
 
 
void NetworkThread(){
    RetrievePendingPackets(InPackets);
    if(LockState==LOCK_REQUESTED){
       linklist<Packet*> *Temp = TempList;
       TempList = InPackets;
       InPackets = Temp;
       LockState = LOCKED;
   }
   if(LockState==UNLOCK_REQUESTED){
      for(link<Packet*> *Lnk = TempList->GetFirst(); Lnk!=null; Lnk = Lnk->Next){
         SendPacket(Lnk->value);
      }
      TempList->Clear();
      LockState = UNLOCKED;
   }
   return;
}
 
void GameThread(){
   if(LockState==LOCKED){
     for(link<Packet*> *Lnk = TempList->GetFirst(); Lnk!=null; Lnk = Lnk->Next){
        ProcessPacket(Lnk->value);
     }
     TempList->Clear();
     linklist<Packet*> *Temp = TempList;
     TempList = OutPackets;
     OutPackets = Temp;
     LockState=UNLOCK_REQUESTED;
   }
   if(LockState==UNLOCKED) LockState = LOCK_REQUESTED;
   UpdateGames();  //Will push packets into the outpackets list.
}

my thoughts are that if the pointers are being swapped in one thread on a different core, then it needs to update the pointer for the value of the pointer for the cache on the thread for the other core.

so, thoughts on it are much appreciated.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

fastcall22

10,918

October 31, 2013 04:14 PM

Well, for one, you could move the data closer together by using a vector instead of a linked list. You could move the data even closer by using the vector to store the packets themselves, though managing would be a bit tricky:


typedef unsigned char byte;
vector<byte> packet_store;

template<class T>
void write_memory( byte* at, const T& val ) {
    // assumes Packet<-T and T is POD
    *reinterpret_cast<T*>(at) = val;
}

template<class T>
void store_packet( vector<byte>& packet_store, const T& packet ) {
    size_t total_size = sizeof(size_t) + sizeof(T);
    
    size_t idx = packet_store.size();
    packet_store.resize(packet_store.size() + total_size); // assumes Packet<-T and T is POD
    
    char* at = &packet_store[0] + idx;
    write_memory(at,total_size);
    write_memory(at+sizeof(size_t),packet);
}

template<class F>
void foreach_packet( vector<byte>& packet_store, F f ) {
    byte* at = &packet_store[0];
    byte* end = begin + packet_store.size();
    for ( size_t total_size = 0; at < end; at += total_size ) {
        sz = *reinterpret_cast<size_t*>(at);
        Packet* p = reinterpret_cast<Packet*>(at+sizeof(size_t));
        
        f(p);
    }
}

Beware, sample code is not alignment-aware.

.. and like some pointers.

Inb4 XKCD or 0x1234ABCD huehuehue.

Madhed

4,095

October 31, 2013 04:30 PM

Are you sure that this design will work? I think lock free queues are usually implemented with ringbuffers.

slicer4ever

6,773

Author

October 31, 2013 05:06 PM

Are you sure that this design will work? I think lock free queues are usually implemented with ringbuffers.

mind pointing out why it won't work? it's activly deployed right now(the above code is psedo-code i wrote up that demonstrates what i'm doing right now, i'll go over it to make sure i didn't miss something.)

@fastcall32, storing each packet in a single buffer would defiantly be more cache-friendly, but for now i'd like to focus on this concept itself, rather than how the packets are stored.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

Madhed

4,095

October 31, 2013 05:46 PM

I guess I was just thrown off because I didn't read into the code enough.

ApochPiQ

23,138

October 31, 2013 05:56 PM

As written, this is a giant race condition waiting to happen... I'm not surprised it "appears" to work, but it's a matter of time before you get some really bad bugs.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

slicer4ever

6,773

Author

October 31, 2013 06:08 PM

As written, this is a giant race condition waiting to happen... I'm not surprised it "appears" to work, but it's a matter of time before you get some really bad bugs.

could you please expand on this. i've been reading over the code several times, and don't see where any racing occurs, each step of the swapping occurs in discrete steps, so at no point do i see where overlap of the data can occur.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

zaphod2

1,522

October 31, 2013 07:21 PM

There is a shared state in this synchronization (LockState). If you want to access it from both threads you have to synchronize these accesses. One way is to use hardware memory barriers before and after the access of the LockState. This will block the instruction and memory access reordering of the compiler and processor and synchronizes memory. The other way is to use an atomic read/write which locks the the memory bus for the time of the memory access. (This is what a CAS-cycle does.) But in the current state of the code it synchronizes the two threads only with luck.

King Mir

2,506

October 31, 2013 08:01 PM

Firstly you aren't using any atomic variables (or legit locks), so it will only work on very permissive architectures like x86, and even them may not if you turn optimizations up high enough for the compiler to mess your code up. LockState needs to be atomic.

Secondly, You have the potential for false sharing with your 4 variables together like that. You need to ensure that all of them are on their own cache line. The usual way to do this is to put each in a class, with an otherwise unused final feild that acts as a spacer. There are other ways too, but beware that, in general, the compiler is free to remove unused variables.

Thirdly, use a vector (or other array structure) not a linked list for cache performance.

Fortunately, assuming the atomic issue is fixed, I don't see a data race.

slicer4ever

6,773

Author

October 31, 2013 08:18 PM

I mark lockstate as violatile, which ensures it is usable for memory locking in this example.

@King Mir, I was thinking that the swap might be suiceptable to the pointers sitting on their own cache lines, when i do the swap, and update the lock, the other thread will likely not have the swapped pointers sitting in it's cache. hmm, this is a potentially serious problem.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

cache friendly lock free concept?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

cache friendly lock free concept?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines