• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.

outRider

Members
  • Content count

    2149
  • Joined

  • Last visited

Community Reputation

852 Good

About outRider

  • Rank
    Contributor
  1. [quote name='GraphicsDude' timestamp='1306460335' post='4816264'] I'm working with an app that is very greedy when it comes to video memory. When one instance of the app is running the performance is fine. I can run two instances of the app and there is little performance loss - that is, each instance runs only slightly slower when compared to only one instance. When a third instance is running the system grinds to a halt. Other tests show the issue is related to graphics and I'm trying to determine the cause. The symptoms point to thrashing but can GPU's have thrashing? I understand there are caches for things like transformed vertices and local texels fetched but those deal with small elements, not entire textures. Also, I think there is a context switch that happens when a different process uses the GPU for rendering, but surely this can't cause the problem I'm seeing. Have any of you experienced any thrashing on the GPU? Note: When all three instances are running I estimate VRAM usage is ~700 MB and the computer has a 1GB video card. [/quote] Yes, there is such a thing as GPU thrashing, not just on VRAM resources but also (GPU-accessible) system memory resources that can be accessed by the GPU. Your app needs a certain number of buffers every frame, some of those buffers need to be in VRAM, some need to be in system memory, and some can be in either. You don't have unlimited VRAM, and you also don't have unlimited GPU-accessible system memory, so buffers may get swapped out of either or both of those to regular system memory and possibly to disk if you start running out of that. Your estimate of 700MB may be off since some buffers may be in both VRAM and GPU-accessible system memory for whatever reason (usually to speed up context switching). The other possibility is that you are not experiencing classical thrashing but excessive memory movement due to other constraints. If your app reads back render targets or other VRAM-resident buffers for example they may have to be moved to a place where the CPU can access them. The CPU usually only has a 256MB window into VRAM, so if your buffer is outside that window it may have to be moved; even if it's in the window it may be moved to system memory because that's what the driver thinks will give the lowest latency. Who knows. And yes there context switches both on the CPU and GPU when rendering from multiple contexts. On some GPUs that don't have HW context switching the driver has to store and re-emit whatever state info is necessary (this means it has to both program the GPU state and re-upload any VRAM-only buffers for example) when switching to a new context. I doubt that this is your problem going from 2 to 3 contexts unless you are hitting some pathological case. Anyway, most of this is just a guess based on the few details you provided and my understanding of such things.
  2. Centos, RHEL, SLES and some other distros intended for 'enterprise' type machines have relatively old (aka 'stable') versions of most software compared to something like Ubuntu so you have to be aware of that if you intend to distribute binaries.
  3. If you want to do this with an external program you find the offset of the instruction or data you want to change and you patch it. In your case you know where in the executable that variable is first written to so you open the executable, fseek to the offset, change the immedate part of the instruction to 0x12c, and you're done. If you want to know which part of the instruction corresponds to the immediate operand you look it up in a manual. That's short simple version of this kind of reversing.
  4. /dev/dsp is just an OSS interface and all you're doing is writing to a shared buffer that the driver then passes to the soundcard somehow. Some drivers will point the hardware at that buffer and the hardware will start reading from that same buffer, other drivers will copy that buffer elsewhere and then tell the hardware what to do with it. If a driver allows you to mmap /dev/dsp as opposed to using read/write then it will provide you with a circular buffer. This circular buffer may be the same buffer that the hardware is reading from or the driver may be reading from it and passing it to the hardware using some other mechanism. Either way, no one will know if/when you've written to this mmapped buffer, the only thing the driver will tell you is where it or the hardware is currently reading from (via an ioctl) and you'll have to make sure you've written valid data there ahead of time.
  5. Your question is vague. Are you writing to MMIO and asking how the device is affected by those writes or are you writing into some other buffer and asking how the device fetches that data? By the way, you really can't use DMA in general in user programs.
  6. Plain volatile is sufficient for x86 and x86-64, you don't need any actual sync instructions.
  7. Quote:Original post by taz0010 Quote:No, because it is only compiling one translation unit at a time, and the inheriting class could be in another translation unit. Oh right. Still, Whole Project Optimisation has been around for ages so I would have thought the compiler would be capable of seeing the "big picture". Do you know if GCC or Intel's compiler can do these optimisations? I was under the impression that compilers made efforts to eliminate virtual calls when possible. Yes they can. I don't know if the two you mentioned do it though.
  8. Quote:Original post by Prune The reason I asked was to have a better idea of the efficiency implications of volatile, in terms of whether caching would still be used for regular memory. If this weren't the case, I would be trying to more avoid it. Red Ant, I was specifically asking about the standard C++ semantics of volatile which is about memory consistency. The MSVC extension mentioned adds implicit barriers, and I use them explicitly--but still requires the volatile keyboard as outRider explained as the barriers by themselves are not sufficient--at least for local variables, which according to MSDN will only be affected by the barrier if they are marked volatile: http://msdn.microsoft.com/en-us/library/f20w0x5e%28v=VS.80%29.aspx The same page also says that if the variable address is accessible non-locally, it does not need volatile. I don't know if other compilers would follow that convention... any comments on this outRider or others? I wouldn't rely on other compilers being able to deduce the global visibility of local variables. This is really simpler than you think it is. Consider this kind of code: int *p = ...; x = *p + 4; ... j = f(*p); ... *p = *p + 1 What do you think the compiler is going to do with *p? Load it from memory every time you use it in an expression, or load it once and keep the value in a register and reuse the register? Now, if you declare *p volatile it will generate a load each time, because you're telling it that the value in memory can change without the compiler's knowledge, because as far as the compiler can see most of the loads can be optimized out. That's completely different from what a barrier does, but the two work hand in hand. When you're dealing with concurrency and shared memory you not only have to make sure things are seen in the right order by others (barriers) but you also have to make sure that you see things properly and in a timely fashion and that you actually load things from memory and not just hang on to stale data (volatile). What MSVC is saying is that it can save you the trouble of using volatile if it can deduce that a local variable can be updated without it's knowledge (i.e. it's globally visible) and in those cases it will make sure the variable's uses don't cross a barrier. Now, I don't know if this also means the compiler will make sure to reload the value at every use like volatile or if it's something in between volatile and no volatile. Quote:Original post by Prune One interesting thing I noticed is that at least for x86 and x86-64, while MSVC has separate _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier, GCC has for all three macros that in the end expand to the same thing, __asm__ __volatile__("" ::: "memory"), and Intel's compiler for both Windows and Linux uses __memory_barrier. Why are they the same for these compilers? I know that older Intel x86 chips did not do write reordering, but I thought x86-64 does. Probably for granularity's sake. On weak memory systems barriers actually result in CPU instructions being emitted because even if the compiler doesn't reorder loads and stores the pipeline can. On strong memory systems where barriers don't usually result in instructions I guess you could give the compiler some latitude by being more precise about a barrier's function, but I wouldn't be surprised if _ReadBarrier/_WriteBarrier/_ReadWriteBarrier all did the same thing on MSVC for x86. Just a guess anyway, I don't know the inner workings of MSVC.
  9. Quote:Original post by Prune Quote:Original post by outRider If Acquire didn't specify that the parameter p was volatile and you called it in a loop nothing would stop a compiler from lifting the load outside the loop, and then you'd never see the value updated in the loop. Thanks, this is one of the answers I was looking for (I didn't think it was obvious because one way to guess it would be to say that since the barrier prevents reordering of further reads to before this one, lifting it out of the loop could violate that...no?) No. Lifting the read out of the loop doesn't violate anything. The read is already in front of the memory barrier, lifting it out of the loop still keeps it ahead of the barrier. If the variable is declared volatile then the compiler knows not to assume anything about the value and that reading the same address over and over in a loop is intentional and needs to be kept there. If it wasn't volatile any compiler worth its salt would lift that guy out of the loop. Quote:Original post by Prune Quote:It doesn't make the CPU "reach all the way to memory", it forces the compiler to be conservative with loads and stores of volatile variables, that's all; the loads/stores still hit cache like they normally would unless you do something to prevent it. But the most common example given for use of volatile is with hardware using DMA such that memory mapped values can be modified by hardware external to the CPU. I don't see how the cache can come into play there since the memory may not be modified through the CPU; how would the machine know if an address is cacheable or not? If I use volatile int *p; that could just as well refer to an address that is changed by hardware external to the CPU; how would the compiler/CPU know whether it's just plain old RAM that it can cache so it wouldn't have to load from the actual address? That's an orthogonal issue and has nothing to do with the compiler. Basically, either the device or bus controller is CPU-cache aware, meaning if the device reads/writes to main memory it will see/invalidate the copy in the CPU's cache (IIRC PCI/PCIE works like this) or the device/bus controller isn't cache aware (IIRC AGP) but the CPU can treat some blocks of memory as uncached, in which case the driver for the device should set this up. When you allocate a block of memory that both the CPU and device will interact with the driver will make sure to specify that that range should be treated as uncached (using MTRR/IAR regs on x86) or neither of the above is available, in which case somewhere in the driver for that device it will use explicit CPU instructions (wbinvd on x86) to flush caches and will usually require you to use map/unmap/lock/unlock/begin/end type functions to access this memory so it can flush as necessary (or there will be some API like readword()/writeword() and they'll tell you to use those). As you can see, none of it has anything to do with the volatile keyword.
  10. Volatile is not useless, however it was never intended to provide ordering (with respect to the CPU pipeline and cache coherency) or atomicity. It is necessary not only in multi-threaded programming but in any situation where memory contents can change independently of the currently executing code, e.g. MMIO regs. All volatile does is tell the compiler that the contents of a particular variable can change outside it's field of vision (by another thread, by a device, etc), which means that it can't go and optimize away loads and stores and just keep things in registers, or reorder certain loads/stores. This is independent of memory barriers, memory coherency, or multi-processor visibility. Volatile will prevent the compiler from doing what it loves... hoisting loads, sinking stores, scheduling, etc. It doesn't make the CPU "reach all the way to memory", it forces the compiler to be conservative with loads and stores of volatile variables, that's all; the loads/stores still hit cache like they normally would unless you do something to prevent it. If Acquire didn't specify that the parameter p was volatile and you called it in a loop nothing would stop a compiler from lifting the load outside the loop, and then you'd never see the value updated in the loop.
  11. Yes, I realize that. His question was whether or not it was 'quick enough' and given the constraints of his scene there are shortcuts that can be taken. I'm not really concerned about why he wants to go the software route, unless he wants to elaborate on his choice and ask for opinions on it.
  12. You can definitely do it, since the road, which makes up something like 1/3-2/3 of the pixels on screen at any given time, is always planar and perpendicular to the view vector.
  13. Why should main's stack frame have a spot for the return of f()? Is there anything in the language or the ABI you're running under that says the return value of functions has to be preserved on stack?
  14. I was asking because this sounds similar to a bug in all drivers using DRI2 (which includes i915) that's already been fixed. I can't recall which component it was fixed in however, it may have been the X server itself. Try updating your driver first.