Sign in to follow this  

C++ Caching thread harms render loop performance

Recommended Posts

Hi,

In my image-based renderer, I want to implement a cache that prefetches images from a neighbourhood for rapid retrieval.

I have a main render loop which has to run at 90FPS (VR), and another caching thread which is supposed to run in the background. However, when I change my position, new images are loaded by the cache thread and it often spikes cpu usage to the point where my render thread suffers. This happens even though my render loop is not synchronised with the caching thread. If an image is not in cache and it is supposed to be rendered, it just returns an empty one for the current frame. Nonetheless, I can still observe stuttering when this happens. When I move slowly it is fine however.

When I put a sleep(100) inside my caching thread, the stuttering does not occur, but ofcourse the caching is also a lot slower.

Any ideas why this happens and how I can solve this? My main theory now is that the caching thread performs a lot of memory accesses to load the images. As RAM only has one address line, other threads also suffer. But how can I ever get around this?

Edited by godmodder

Share this post


Link to post
Share on other sites

Hi,

I think you are right about the root cause of the problem.

I have no experience on this topic:

But I am thinking that the caching thread might have to do its work based on a priority list.

Then images which was skipped in one loop gets a high priority in the next. And then some sort of max work load for the cache system.

Another option could be to have the main thread control the cache system so it is only running when the GPU is busy and the CPU might have less to do. I know it kind of kills the idea of concurrent processing, but it might be use full to for instance trigger every processing loop in the cache system.

Another though:

How many threads are accessing the Hard disc? If you have many access active at the same time. The hard disc might spend a lot of time seeking from one file to the other and back again.

Just some thoughts, hope it pushes your brain work in the right direction :-)

/Kim

 

Share this post


Link to post
Share on other sites
50 minutes ago, ApochPiQ said:

How does your caching thread know when to stop doing work and idle?

It observes the current position and caches images around that point in concentric circles. I have set a manual threshold so that it caches only e.g. 5 circles. Every time a significant change in position occurs, the cached area moves with it. Some items will already be in cache, others will have to be loaded or deleted.

1 hour ago, kikr4000 said:

Then images which was skipped in one loop gets a high priority in the next. And then some sort of max work load for the cache system.

I see what you mean. The sleep() I put in the caching thread sort of reduces the max work the thread can do in a certain amount of time in a crude manner and it does indeed seem to help. I have also played with thread priorities a bit, but this does not seem to help. I've always thought the main thread would take priority and just push the caching thread to the background, but apparently that's not how it works. This puzzles me... Isn't this the whole point of priorities?

1 hour ago, kikr4000 said:

How many threads are accessing the Hard disc?

Only the caching thread reads jpeg files from disk. This only takes a fraction of the time compared to the decoding of these though.

 

I don't understand why the caching thread doesn't just yield when set to a lower priority. It is perfectly fine if it caches fewer images this way, but the rendering should never slow down for it. Surely there exists a mechanism for this no?

EDIT: I'm starting to think that thread priority only applies to raw CPU time and that threads compete equally for memory accesses. This sounds counter-intuitive to me, however.

Edited by godmodder

Share this post


Link to post
Share on other sites

Look into priority inversion - it sounds like you have stumbled across a classic case of it :-)

 

What happens if you do things a little more lazily, and only re-trigger caching when your position moves, say, 150 units? (Or some other number that makes sense in your coordinate space.)

Share this post


Link to post
Share on other sites

Maybe you need a more advanced management/ caching strategy in which your caching thread and your render thread have as less friction as possible. The first question is 'who owns' the memory? I would say that your render thread needs to own the storage that maintains the images so caching is just a visitor here that potentially could set some kind of flags that tells render-thread that there isnt an image yet so it could pass over that image to the next one. Checking a bit flag is insofar atomic that there couldnt happen a break between writing bytes so you might not need any interlocked ops here.

Caching thread would know need to kind of responsibilities, tagging memory as obsolete and loading images from disk in the background where I would go for some package/ memory mapping approach rather than loading single files.

A second good caching strategy is to encounter an access timestamp and remove data from cache as needed but as few as necessary and from the oldest access upwards. The propability is very high that if you swing your head left, you will far or less move it back to the right. This would also limit the number of cache misses due to heavy move your head arround. This could otherwise cause your cache to load images that have become obsolete in next frame because you already moved away but then returned so need to cache this again

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      628749
    • Total Posts
      2984496
  • Similar Content

    • By Josheir
      void update() { if (thrust) { dx += cos(angle*DEGTORAD)*.02; dy += sin(angle*DEGTORAD)*.02; } else { dx*=0.99; dy*=0.99; } int maxSpeed = 15; float speed = sqrt(dx*dx+dy*dy); if (speed>maxSpeed) { dx *= maxSpeed/speed; dy *= maxSpeed/speed; } x+=dx; y+=dy; . . . } In the above code, why is maxSpeed being divided by the speed variable.  I'm stumped.
       
      Thank you,
      Josheir
    • By Benjamin Shefte
      Hey there,  I have this old code im trying to compile using GCC and am running into a few issues..
      im trying to figure out how to convert these functions to gcc
      static __int64 MyQueryPerformanceFrequency() { static __int64 aFreq = 0; if(aFreq!=0) return aFreq; LARGE_INTEGER s1, e1, f1; __int64 s2, e2, f2; QueryPerformanceCounter(&s1); s2 = MyQueryPerformanceCounter(); Sleep(50); e2 = MyQueryPerformanceCounter(); QueryPerformanceCounter(&e1); QueryPerformanceFrequency(&f1); double aTime = (double)(e1.QuadPart - s1.QuadPart)/f1.QuadPart; f2 = (e2 - s2)/aTime; aFreq = f2; return aFreq; } void PerfTimer::GlobalStart(const char *theName) { gPerfTimerStarted = true; gPerfTotalTime = 0; gPerfTimerStartCount = 0; gPerfElapsedTime = 0; LARGE_INTEGER anInt; QueryPerformanceCounter(&anInt); gPerfResetTick = anInt.QuadPart; } /////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////// void PerfTimer::GlobalStop(const char *theName) { LARGE_INTEGER anInt; QueryPerformanceCounter(&anInt); LARGE_INTEGER aFreq; QueryPerformanceFrequency(&aFreq); gPerfElapsedTime = (double)(anInt.QuadPart - gPerfResetTick)/aFreq.QuadPart*1000.0; gPerfTimerStarted = false; }  
      I also tried converting this function (original function is the first function below and my converted for gcc function is under that) is this correct?:
      #if defined(WIN32) static __int64 MyQueryPerformanceCounter() { // LARGE_INTEGER anInt; // QueryPerformanceCounter(&anInt); // return anInt.QuadPart; #if defined(WIN32) unsigned long x,y; _asm { rdtsc mov x, eax mov y, edx } __int64 result = y; result<<=32; result|=x; return result; } #else static __int64 MyQueryPerformanceCounter() { struct timeval t1, t2; double elapsedTime; // start timer gettimeofday(&t1, NULL); Sleep(50); // stop timer gettimeofday(&t2, NULL); // compute and print the elapsed time in millisec elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0; // sec to ms elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0; // us to ms return elapsedTime; } #endif Any help would be appreciated, Thank you!
    • By mister345
      Hi, I'm building a game engine using DirectX11 in c++.
      I need a basic physics engine to handle collisions and motion, and no time to write my own.
      What is the easiest solution for this? Bullet and PhysX both seem too complicated and would still require writing my own wrapper classes, it seems. 
      I found this thing called PAL - physics abstraction layer that can support bullet, physx, etc, but it's so old and no info on how to download or install it.
      The simpler the better. Please let me know, thanks!
    • By lawnjelly
      It comes that time again when I try and get my PC build working on Android via Android Studio. All was going swimmingly, it ran in the emulator fine, but on my first actual test device (Google Nexus 7 2012 tablet (32 bit ARM Cortex-A9, ARM v7A architecture)) I was getting a 'SIGBUS illegal alignment' crash.
      My little research has indicated that while x86 is fine with loading 16 / 32 / 64 bit values from any byte address in memory, the earlier ARM chips may need data to be aligned to the data size. This isn't a massive problem, and I see the reason for it (probably faster, like SIMD aligned loads, and simpler for the CPU). I probably have quite a few of these, particular in my own byte packed file formats. I can adjust the exporter / formats so that they are using the required alignment.
      Just to confirm, if anyone knows this, is it all 16 / 32 / 64 bit accesses that need to be data size aligned on early android devices? Or e.g. just 64 bit size access? 
      And is there any easy way to get the compiler to spit out some kind of useful information as to the alignment of each member of a struct / class, so I can quickly pin down the culprits?
      The ARM docs (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html) suggest another alternative is using a __packed qualifier. Anyone used this, is this practical?
    • By Josheir
      In the following code:

       
      Point p = a[1]; center of rotation for (int i = 0; I<4; i++) { int x = a[i].x - p.x; int y = a[i].y - p.y; a[i].x = y + p.x; a[i].y = - x + p.y; }  
      I am understanding that a 90 degree shift results in a change like:   
      xNew = -y
      yNew = x
       
      Could someone please explain how the two additions and subtractions of the p.x and p.y works?
       
      Thank you,
      Josheir
  • Popular Now