I have trouble understanding how cache is implemented. I understand all low level aspects of digital computations, yet this is totaly out of my scope. Is there a way to implement cache out of cpu management? Like, if there is a part of memory that is going to get accesed to read from, how would I cache it? The question is not wheather it would benefit me, the question is why it would not benefit me. My more precise problem is GPU cache. There are multiple SIMD (single instruction multiple data) threads running at my side, but they many times access the same memory to read from. This couses huge stalks if the memory is accessed by more SIMD threads. So my understanding is that cache is a memory of every individual SIMD thread that gets populated with memory that SIMD thread is likely to access. Is it true? My problem is, how to make sure that every SIMD reads from its own spot in memory. Only way I can distinguish between SIMD thread is index of pixel they are computing, this differs fo r every SIMD thread, running or going to run. My only idea of implementing cache for those threads would be:
cache=new [numofthreads*cachesize]<=data likely to read by all paralel threads (say 80, that means 80 identical data often)
Is this how it works?
Thanks a lot for any clarifications!