Jump to content

  • Log In with Google      Sign In   
  • Create Account


outRider

Member Since 04 Jul 2000
Offline Last Active May 17 2014 11:45 PM

Posts I've Made

In Topic: GPU Thrashing

30 May 2011 - 09:37 AM

I'm working with an app that is very greedy when it comes to video memory. When one instance of the app is running the performance is fine. I can run two instances of the app and there is little performance loss - that is, each instance runs only slightly slower when compared to only one instance. When a third instance is running the system grinds to a halt. Other tests show the issue is related to graphics and I'm trying to determine the cause. The symptoms point to thrashing but can GPU's have thrashing? I understand there are caches for things like transformed vertices and local texels fetched but those deal with small elements, not entire textures. Also, I think there is a context switch that happens when a different process uses the GPU for rendering, but surely this can't cause the problem I'm seeing. Have any of you experienced any thrashing on the GPU?

Note: When all three instances are running I estimate VRAM usage is ~700 MB and the computer has a 1GB video card.



Yes, there is such a thing as GPU thrashing, not just on VRAM resources but also (GPU-accessible) system memory resources that can be accessed by the GPU. Your app needs a certain number of buffers every frame, some of those buffers need to be in VRAM, some need to be in system memory, and some can be in either. You don't have unlimited VRAM, and you also don't have unlimited GPU-accessible system memory, so buffers may get swapped out of either or both of those to regular system memory and possibly to disk if you start running out of that. Your estimate of 700MB may be off since some buffers may be in both VRAM and GPU-accessible system memory for whatever reason (usually to speed up context switching).

The other possibility is that you are not experiencing classical thrashing but excessive memory movement due to other constraints. If your app reads back render targets or other VRAM-resident buffers for example they may have to be moved to a place where the CPU can access them. The CPU usually only has a 256MB window into VRAM, so if your buffer is outside that window it may have to be moved; even if it's in the window it may be moved to system memory because that's what the driver thinks will give the lowest latency. Who knows.

And yes there context switches both on the CPU and GPU when rendering from multiple contexts. On some GPUs that don't have HW context switching the driver has to store and re-emit whatever state info is necessary (this means it has to both program the GPU state and re-upload any VRAM-only buffers for example) when switching to a new context. I doubt that this is your problem going from 2 to 3 contexts unless you are hitting some pathological case.

Anyway, most of this is just a guess based on the few details you provided and my understanding of such things.

In Topic: Compiling GLIBC on Linux. Oh golly gosh!

04 February 2011 - 10:48 AM

Centos, RHEL, SLES and some other distros intended for 'enterprise' type machines have relatively old (aka 'stable') versions of most software compared to something like Ubuntu so you have to be aware of that if you intend to distribute binaries.

In Topic: Changing variable once it is found

21 January 2011 - 09:08 PM

If you want to do this with an external program you find the offset of the instruction or data you want to change and you patch it. In your case you know where in the executable that variable is first written to so you open the executable, fseek to the offset, change the immedate part of the instruction to 0x12c, and you're done. If you want to know which part of the instruction corresponds to the immediate operand you look it up in a manual. That's short simple version of this kind of reversing.

In Topic: Memory Mapped Devices

25 December 2010 - 06:32 PM

/dev/dsp is just an OSS interface and all you're doing is writing to a shared buffer that the driver then passes to the soundcard somehow. Some drivers will point the hardware at that buffer and the hardware will start reading from that same buffer, other drivers will copy that buffer elsewhere and then tell the hardware what to do with it. If a driver allows you to mmap /dev/dsp as opposed to using read/write then it will provide you with a circular buffer. This circular buffer may be the same buffer that the hardware is reading from or the driver may be reading from it and passing it to the hardware using some other mechanism. Either way, no one will know if/when you've written to this mmapped buffer, the only thing the driver will tell you is where it or the hardware is currently reading from (via an ioctl) and you'll have to make sure you've written valid data there ahead of time.

In Topic: Memory Mapped Devices

25 December 2010 - 05:07 AM

Your question is vague. Are you writing to MMIO and asking how the device is affected by those writes or are you writing into some other buffer and asking how the device fetches that data? By the way, you really can't use DMA in general in user programs.

PARTNERS