Advertisement Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

852 Good

About outRider

  • Rank

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. outRider

    GPU Thrashing

    Yes, there is such a thing as GPU thrashing, not just on VRAM resources but also (GPU-accessible) system memory resources that can be accessed by the GPU. Your app needs a certain number of buffers every frame, some of those buffers need to be in VRAM, some need to be in system memory, and some can be in either. You don't have unlimited VRAM, and you also don't have unlimited GPU-accessible system memory, so buffers may get swapped out of either or both of those to regular system memory and possibly to disk if you start running out of that. Your estimate of 700MB may be off since some buffers may be in both VRAM and GPU-accessible system memory for whatever reason (usually to speed up context switching). The other possibility is that you are not experiencing classical thrashing but excessive memory movement due to other constraints. If your app reads back render targets or other VRAM-resident buffers for example they may have to be moved to a place where the CPU can access them. The CPU usually only has a 256MB window into VRAM, so if your buffer is outside that window it may have to be moved; even if it's in the window it may be moved to system memory because that's what the driver thinks will give the lowest latency. Who knows. And yes there context switches both on the CPU and GPU when rendering from multiple contexts. On some GPUs that don't have HW context switching the driver has to store and re-emit whatever state info is necessary (this means it has to both program the GPU state and re-upload any VRAM-only buffers for example) when switching to a new context. I doubt that this is your problem going from 2 to 3 contexts unless you are hitting some pathological case. Anyway, most of this is just a guess based on the few details you provided and my understanding of such things.
  2. Centos, RHEL, SLES and some other distros intended for 'enterprise' type machines have relatively old (aka 'stable') versions of most software compared to something like Ubuntu so you have to be aware of that if you intend to distribute binaries.
  3. If you want to do this with an external program you find the offset of the instruction or data you want to change and you patch it. In your case you know where in the executable that variable is first written to so you open the executable, fseek to the offset, change the immedate part of the instruction to 0x12c, and you're done. If you want to know which part of the instruction corresponds to the immediate operand you look it up in a manual. That's short simple version of this kind of reversing.
  4. outRider

    Memory Mapped Devices

    /dev/dsp is just an OSS interface and all you're doing is writing to a shared buffer that the driver then passes to the soundcard somehow. Some drivers will point the hardware at that buffer and the hardware will start reading from that same buffer, other drivers will copy that buffer elsewhere and then tell the hardware what to do with it. If a driver allows you to mmap /dev/dsp as opposed to using read/write then it will provide you with a circular buffer. This circular buffer may be the same buffer that the hardware is reading from or the driver may be reading from it and passing it to the hardware using some other mechanism. Either way, no one will know if/when you've written to this mmapped buffer, the only thing the driver will tell you is where it or the hardware is currently reading from (via an ioctl) and you'll have to make sure you've written valid data there ahead of time.
  5. outRider

    Memory Mapped Devices

    Your question is vague. Are you writing to MMIO and asking how the device is affected by those writes or are you writing into some other buffer and asking how the device fetches that data? By the way, you really can't use DMA in general in user programs.
  6. Plain volatile is sufficient for x86 and x86-64, you don't need any actual sync instructions.
  7. Quote:Original post by taz0010 Quote:No, because it is only compiling one translation unit at a time, and the inheriting class could be in another translation unit. Oh right. Still, Whole Project Optimisation has been around for ages so I would have thought the compiler would be capable of seeing the "big picture". Do you know if GCC or Intel's compiler can do these optimisations? I was under the impression that compilers made efforts to eliminate virtual calls when possible. Yes they can. I don't know if the two you mentioned do it though.
  8. outRider

    volatile and consistency

    Quote:Original post by Prune The reason I asked was to have a better idea of the efficiency implications of volatile, in terms of whether caching would still be used for regular memory. If this weren't the case, I would be trying to more avoid it. Red Ant, I was specifically asking about the standard C++ semantics of volatile which is about memory consistency. The MSVC extension mentioned adds implicit barriers, and I use them explicitly--but still requires the volatile keyboard as outRider explained as the barriers by themselves are not sufficient--at least for local variables, which according to MSDN will only be affected by the barrier if they are marked volatile: The same page also says that if the variable address is accessible non-locally, it does not need volatile. I don't know if other compilers would follow that convention... any comments on this outRider or others? I wouldn't rely on other compilers being able to deduce the global visibility of local variables. This is really simpler than you think it is. Consider this kind of code: int *p = ...; x = *p + 4; ... j = f(*p); ... *p = *p + 1 What do you think the compiler is going to do with *p? Load it from memory every time you use it in an expression, or load it once and keep the value in a register and reuse the register? Now, if you declare *p volatile it will generate a load each time, because you're telling it that the value in memory can change without the compiler's knowledge, because as far as the compiler can see most of the loads can be optimized out. That's completely different from what a barrier does, but the two work hand in hand. When you're dealing with concurrency and shared memory you not only have to make sure things are seen in the right order by others (barriers) but you also have to make sure that you see things properly and in a timely fashion and that you actually load things from memory and not just hang on to stale data (volatile). What MSVC is saying is that it can save you the trouble of using volatile if it can deduce that a local variable can be updated without it's knowledge (i.e. it's globally visible) and in those cases it will make sure the variable's uses don't cross a barrier. Now, I don't know if this also means the compiler will make sure to reload the value at every use like volatile or if it's something in between volatile and no volatile. Quote:Original post by Prune One interesting thing I noticed is that at least for x86 and x86-64, while MSVC has separate _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier, GCC has for all three macros that in the end expand to the same thing, __asm__ __volatile__("" ::: "memory"), and Intel's compiler for both Windows and Linux uses __memory_barrier. Why are they the same for these compilers? I know that older Intel x86 chips did not do write reordering, but I thought x86-64 does. Probably for granularity's sake. On weak memory systems barriers actually result in CPU instructions being emitted because even if the compiler doesn't reorder loads and stores the pipeline can. On strong memory systems where barriers don't usually result in instructions I guess you could give the compiler some latitude by being more precise about a barrier's function, but I wouldn't be surprised if _ReadBarrier/_WriteBarrier/_ReadWriteBarrier all did the same thing on MSVC for x86. Just a guess anyway, I don't know the inner workings of MSVC.
  9. outRider

    volatile and consistency

    Quote:Original post by Prune Quote:Original post by outRider If Acquire didn't specify that the parameter p was volatile and you called it in a loop nothing would stop a compiler from lifting the load outside the loop, and then you'd never see the value updated in the loop. Thanks, this is one of the answers I was looking for (I didn't think it was obvious because one way to guess it would be to say that since the barrier prevents reordering of further reads to before this one, lifting it out of the loop could violate No. Lifting the read out of the loop doesn't violate anything. The read is already in front of the memory barrier, lifting it out of the loop still keeps it ahead of the barrier. If the variable is declared volatile then the compiler knows not to assume anything about the value and that reading the same address over and over in a loop is intentional and needs to be kept there. If it wasn't volatile any compiler worth its salt would lift that guy out of the loop. Quote:Original post by Prune Quote:It doesn't make the CPU "reach all the way to memory", it forces the compiler to be conservative with loads and stores of volatile variables, that's all; the loads/stores still hit cache like they normally would unless you do something to prevent it. But the most common example given for use of volatile is with hardware using DMA such that memory mapped values can be modified by hardware external to the CPU. I don't see how the cache can come into play there since the memory may not be modified through the CPU; how would the machine know if an address is cacheable or not? If I use volatile int *p; that could just as well refer to an address that is changed by hardware external to the CPU; how would the compiler/CPU know whether it's just plain old RAM that it can cache so it wouldn't have to load from the actual address? That's an orthogonal issue and has nothing to do with the compiler. Basically, either the device or bus controller is CPU-cache aware, meaning if the device reads/writes to main memory it will see/invalidate the copy in the CPU's cache (IIRC PCI/PCIE works like this) or the device/bus controller isn't cache aware (IIRC AGP) but the CPU can treat some blocks of memory as uncached, in which case the driver for the device should set this up. When you allocate a block of memory that both the CPU and device will interact with the driver will make sure to specify that that range should be treated as uncached (using MTRR/IAR regs on x86) or neither of the above is available, in which case somewhere in the driver for that device it will use explicit CPU instructions (wbinvd on x86) to flush caches and will usually require you to use map/unmap/lock/unlock/begin/end type functions to access this memory so it can flush as necessary (or there will be some API like readword()/writeword() and they'll tell you to use those). As you can see, none of it has anything to do with the volatile keyword.
  10. outRider

    volatile and consistency

    Volatile is not useless, however it was never intended to provide ordering (with respect to the CPU pipeline and cache coherency) or atomicity. It is necessary not only in multi-threaded programming but in any situation where memory contents can change independently of the currently executing code, e.g. MMIO regs. All volatile does is tell the compiler that the contents of a particular variable can change outside it's field of vision (by another thread, by a device, etc), which means that it can't go and optimize away loads and stores and just keep things in registers, or reorder certain loads/stores. This is independent of memory barriers, memory coherency, or multi-processor visibility. Volatile will prevent the compiler from doing what it loves... hoisting loads, sinking stores, scheduling, etc. It doesn't make the CPU "reach all the way to memory", it forces the compiler to be conservative with loads and stores of volatile variables, that's all; the loads/stores still hit cache like they normally would unless you do something to prevent it. If Acquire didn't specify that the parameter p was volatile and you called it in a loop nothing would stop a compiler from lifting the load outside the loop, and then you'd never see the value updated in the loop.
  11. Yes, I realize that. His question was whether or not it was 'quick enough' and given the constraints of his scene there are shortcuts that can be taken. I'm not really concerned about why he wants to go the software route, unless he wants to elaborate on his choice and ask for opinions on it.
  12. You can definitely do it, since the road, which makes up something like 1/3-2/3 of the pixels on screen at any given time, is always planar and perpendicular to the view vector.
  13. outRider


    Why should main's stack frame have a spot for the return of f()? Is there anything in the language or the ABI you're running under that says the return value of functions has to be preserved on stack?
  14. outRider

    GLX and ResizeRequest

    I was asking because this sounds similar to a bug in all drivers using DRI2 (which includes i915) that's already been fixed. I can't recall which component it was fixed in however, it may have been the X server itself. Try updating your driver first.
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!