Jump to content
  • Advertisement
Sign in to follow this  
slicer4ever

cache friendly lock free concept?

This topic is 2079 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have suspicions that the following concept is not very cache friendly, but i'm no expert on the subject, and like some pointers.

 

essentially, i have 2 threads, one for networking, and one for the game simulation.  i want to process the packets in the game simulation thread, rather than in the network thread so that the simulation can be maintained as deterministic.

 

here's the concept i came up with for switching in/out packets between the two threads:

 
 
 
violatile int LockState = UNLOCKED;
linklist<Packet*> *InPackets = new linklist<Packet*>;
linklist<Packet*> *OutPackets = new linklist<Packet*>;
linklist<Packet*> *TempList = new linklist<Packet*>;
 
 
void NetworkThread(){
    RetrievePendingPackets(InPackets);
    if(LockState==LOCK_REQUESTED){
       linklist<Packet*> *Temp = TempList;
       TempList = InPackets;
       InPackets = Temp;
       LockState = LOCKED;
   }
   if(LockState==UNLOCK_REQUESTED){
      for(link<Packet*> *Lnk = TempList->GetFirst(); Lnk!=null; Lnk = Lnk->Next){
         SendPacket(Lnk->value);
      }
      TempList->Clear();
      LockState = UNLOCKED;
   }
   return;
}
 
void GameThread(){
   if(LockState==LOCKED){
     for(link<Packet*> *Lnk = TempList->GetFirst(); Lnk!=null; Lnk = Lnk->Next){
        ProcessPacket(Lnk->value);
     }
     TempList->Clear();
     linklist<Packet*> *Temp = TempList;
     TempList = OutPackets;
     OutPackets = Temp;
     LockState=UNLOCK_REQUESTED;
   }
   if(LockState==UNLOCKED) LockState = LOCK_REQUESTED;
   UpdateGames();  //Will push packets into the outpackets list.
}
 

my thoughts are that if the pointers are being swapped in one thread on a different core, then it needs to update the pointer for the value of the pointer for the cache on the thread for the other core.

 

so, thoughts on it are much appreciated.

Edited by slicer4ever

Share this post


Link to post
Share on other sites
Advertisement
Well, for one, you could move the data closer together by using a vector instead of a linked list. You could move the data even closer by using the vector to store the packets themselves, though managing would be a bit tricky:
typedef unsigned char byte;
vector<byte> packet_store;

template<class T>
void write_memory( byte* at, const T& val ) {
    // assumes Packet<-T and T is POD
    *reinterpret_cast<T*>(at) = val;
}

template<class T>
void store_packet( vector<byte>& packet_store, const T& packet ) {
    size_t total_size = sizeof(size_t) + sizeof(T);
    
    size_t idx = packet_store.size();
    packet_store.resize(packet_store.size() + total_size); // assumes Packet<-T and T is POD
    
    char* at = &packet_store[0] + idx;
    write_memory(at,total_size);
    write_memory(at+sizeof(size_t),packet);
}

template<class F>
void foreach_packet( vector<byte>& packet_store, F f ) {
    byte* at = &packet_store[0];
    byte* end = begin + packet_store.size();
    for ( size_t total_size = 0; at < end; at += total_size ) {
        sz = *reinterpret_cast<size_t*>(at);
        Packet* p = reinterpret_cast<Packet*>(at+sizeof(size_t));
        
        f(p);
    }
}
Beware, sample code is not alignment-aware.

.. and like some pointers.

Inb4 XKCD or 0x1234ABCD huehuehue.

Share this post


Link to post
Share on other sites

Are you sure that this design will work? I think lock free queues are usually implemented with ringbuffers.

Share this post


Link to post
Share on other sites

Are you sure that this design will work? I think lock free queues are usually implemented with ringbuffers.

mind pointing out why it won't work?  it's activly deployed right now(the above code is psedo-code i wrote up that demonstrates what i'm doing right now, i'll go over it to make sure i didn't miss something.)

 

@fastcall32, storing each packet in a single buffer would defiantly be more cache-friendly, but for now i'd like to focus on this concept itself, rather than how the packets are stored.

Share this post


Link to post
Share on other sites

As written, this is a giant race condition waiting to happen... I'm not surprised it "appears" to work, but it's a matter of time before you get some really bad bugs.

Share this post


Link to post
Share on other sites

As written, this is a giant race condition waiting to happen... I'm not surprised it "appears" to work, but it's a matter of time before you get some really bad bugs.

could you please expand on this.  i've been reading over the code several times, and don't see where any racing occurs, each step of the swapping occurs in discrete steps, so at no point do i see where overlap of the data can occur.

Share this post


Link to post
Share on other sites

There is a shared state in this synchronization (LockState). If you want to access it from both threads you have to synchronize these accesses. One way is to use hardware memory barriers before and after the access of the LockState. This will block the instruction and memory access reordering of the compiler and processor and synchronizes memory. The other way is to use an atomic read/write which locks the the memory bus for the time of the memory access. (This is what a CAS-cycle does.) But in the current state of the code it synchronizes the two threads only with luck.

 

Share this post


Link to post
Share on other sites
Firstly you aren't using any atomic variables (or legit locks), so it will only work on very permissive architectures like x86, and even them may not if you turn optimizations up high enough for the compiler to mess your code up. LockState needs to be atomic.

Secondly, You have the potential for false sharing with your 4 variables together like that. You need to ensure that all of them are on their own cache line. The usual way to do this is to put each in a class, with an otherwise unused final feild that acts as a spacer. There are other ways too, but beware that, in general, the compiler is free to remove unused variables.

Thirdly, use a vector (or other array structure) not a linked list for cache performance.

Fortunately, assuming the atomic issue is fixed, I don't see a data race.

Share this post


Link to post
Share on other sites

I mark lockstate as violatile, which ensures it is usable for memory locking in this example.

 

@King Mir, I was thinking that the swap might be suiceptable to the pointers sitting on their own cache lines, when i do the swap, and update the lock, the other thread will likely not have the swapped pointers sitting in it's cache.  hmm, this is a potentially serious problem.

Edited by slicer4ever

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!