• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.


  • Content count

  • Joined

  • Last visited

Community Reputation

3299 Excellent

About ajmiles

  • Rank
  1. If you'd like to get rid of the iffy looking calculation for "MipMapCount", just call Tex->GetDesc() and the MipCount field will be populated with the actual mip count (rather than the 0 you filled it in with to create it). Glad it's working!
  2. You seem to be updating subresource 'i' here: D3D11DeviceContext->UpdateSubresource(Tex.Get(), i, 0, initData[i].pSysMem, initData[i].SysMemPitch, size); Whereas you should be updating subresource index D3D11CalcSubresource(mip, slice, numMips). The 6 mips you want to update are not subresources 0,1,2,3,4,5, but rather (0 * numMips), (1 * numMips), (2 * numMips)... etc.
  3. I would try running this on WARP, just to be sure. The stretch marks at the edge of the copy I have no explanation for, and so it may well be a bug in a particular hardware vendor's driver / implementation.
  4. 0.1ms sounds about right for copying 1MB over a bus that's roughly 16GB/s, so I'd be inclined to believe that number. It should scale approximately linearly. You have to bear in mind that the CPU timer isn't just timing how long it takes the CPU to do useful work, but how long it takes the GPU to catch up and do all its outstanding work. By calling Map you've required the GPU to catch up and execute all the work in its queue, do the copy and signal to the CPU that it's done. The more work the GPU has to run prior to the call to "CopyResource", the longer the CPU has to sit there and wait for it to complete. For that reason, I wouldn't expect the CPU timer to ever record a very low value in the region of 0.1ms no matter how small the copy is.
  5. Interesting, it might be that we haven't pushed anything out yet with that change in. It still exists in the Creators Update SDK and whatever release of Windows 10 'maxest' is running it still seems to work. I'll follow up with you offline why we decided the API wasn't useful. It feels like it still has value in scenarios where you want a consistent time from run-to-run and want to analyse whether an algorithmic change improves performance or not. Even if it doesn't give you real numbers for any user in the real world, consistency across runs still seems useful during development / optimisation. I don't have a definitive answer to why this might be, but I do have one theory. You can think of (almost) every API call you make being a packet of data that gets fed to the GPU to execute at a later date. Behind the scenes these packets of data (Draw, Dispatch, Copy, etc) are broken up into segments and sent to the GPU as a batch rather than 1 by 1. The Begin/End Query packets are no different. It may be that the Timestamp query you've inserted after the "Map" is the first command after a batch of commands is sent to the GPU and therefore it isn't immediately sent to the GPU after the CopyResource/Map events have executed. Therefore, my theory is that you're actually timing a lot of idle time between the CopyResource and the next chunk of GPU work that causes the buffer to get flushed and the GPU starts executing useful work again. You don't have any control over when D3D11 breaks a segment and flushes the commands to the GPU (you can force a flush using ID3D11DeviceContext::Flush, but you can't prevent one). I wouldn't expect 'Map' to do anything on the GPU, but moving the timestamp query before the map may be sufficient to get the timestamp query executed in the segment before the break. Try that perhaps? I've never see D3D11_COUNTER used before, but Jesse (SoldierOfLight) may know whether it ever saw any use.
  6. Even if you time only the work you're interested in (and not the whole frame), it's still going to take a variable amount of time depending on how high the GPU's clock speed happens to be at that point in time. If the GPU can see it's only doing 2ms of work every 16ms, then it may underclock itself by a factor of 3-4x such that the 2ms of work ends up taking 6ms-8ms instead. What's happening is something like this: 1) At 1500MHz, your work takes 0.4ms and ~16.2ms is spent idle at the end of the frame. 2) The GPU realises it could run a bit slower and still be done in plenty of time so it underclocks itself just a little bit to save power. 3) At 1200MHz, your work takes 0.5ms and ~16.1ms is spent idle at the end of the frame. 4) Still plenty of time spare, so it underclocks itself even further. 5) At 900MHz, your work takes 0.6ms and ~16.0ms is spent idle at the end of the frame. 6) *Still* plenty of time spare, so it dramatically underclocks itself. 7) At 500MHz, your work takes 3x longer than it did originally, now costing 1.2ms. There's still 15.4ms of idle time at the end of the frame, so this is still OK. 8) At this point the GPU may not have any lower power states to clock down to, so the work never takes any more than 1.2ms. In D3D12 we (Microsoft) added an API called ID3D12Device::SetStablePowerState, in part to address this problem. This API fixes the GPU's clock speed to something it can always run at without having to throttle back from due to thermal or power limitations. So if your GPU has a "Base Clock" of 1500MHz but can periodically "Boost" to 1650MHz, we'll fix the clock speed to 1500MHz. Note that this API does not work on end-users machines as it requires Debug bits to be installed, so can't be used in retail titles. Note also that performance will likely be worse than on an end-user's machine because we've artificially limited the clock speed below the peak to ensure a stable and consistent clock speed. With this in place, profiling becomes easier because the clock speed is known to be stable across runs and won't clock up and down as in your situation. Since I don't think SetStablePowerState was ever added to D3D11, it should be simple enough to create a dummy D3D12 application, create a device, call SetStablePowerState and then put the application into an infinite Sleep in the background. I've never tried this, but that should be sufficient to keep the GPU's frequency fixed to some value for the lifetime that this dummy D3D12 application/device is created and running.
  7. This behaviour sounds exactly like what I'd expect if the GPU was throttling back its frequency because you aren't giving it enough work to do to warrant being clocked at peak frequency. By turning off VSync you're giving the GPU as much work to do as it can manage. With VSync enabled you're restricting it to 60 frames worth of work per second which it can easily deliver at reduced clock speeds.
  8. The example does use textures of the same resolution, but indeed there is no reason that they need have the same Width, Height, Format or Mip Count. So long as they are an array of 2D Textures, that's fine. Depending on how many textures you want bound at once, be aware that you may be excluding Resource Binding Tier 1 hardware: https://msdn.microsoft.com/en-gb/library/windows/desktop/dn899127(v=vs.85).aspx Note that in order to get truly non-uniform resource indexing you need to tell HLSL + the compiler that the index is non-uniform using the "NonUniformResourceIndex" intrinsic. Failing to do this will likely result in the index from the first thread of the wave deciding which texture to sample from. https://msdn.microsoft.com/en-us/library/windows/desktop/dn899207(v=vs.85).aspx
  9. The only small quirk left is that you map it for READ_WRITE rather than just READ, but that shouldn't be a problem. You can remove CPU_ACCESS_WRITE from the temp texture creation as well if you never intend to write to it. Do you know that you have actually rendered something to the source texture?
  10. There's a few obvious errors in the code you've written which are worth fixing: The pitch of the source (read pointer) should be incremented by mapped.RowPitch, not Desc.Width. Desc.Width is the number of pixels the texture is wide, and is not only measured in the wrong units (pixels, instead of bytes) but the pitch is likely something other than "Desc.Width * 4". The pitch of the destination (dest) should be incremented by Desc.Width * 4 (bytes) since 'dest' is an unsigned char*. Your for loop attempts to copy the data row by row, but should be iterating "for(int i = 0; i < Desc.Height; ..." rather than Desc.Width. The amount memcpy'ed out per row should be Desc.Width * 4, not Desc.Height * 4. I'm not sure what the relevance of '1200' is, but you're printing out just the first 1200 colour channels (RGBA 300 times). So if the first 300 pixels are transparent black, then you'll get 0 printed out all the time. Try iterating over every pixel just to be sure.
  11. Are you using object-space normal maps rather than tangent-space normal maps? The ability to do "this has no normal map, so I'll replace it with a 1x1 texture" works for tangent-space normal maps, but not object-space ones. I would generally try and avoid branching on presence / non-presence of textures and ensure I'd bound a cut-down shader without normal map support for objects that don't want to provide the texture.
  12. You should probably be making use of D3D11_APPEND_ALIGNED_ELEMENT instead of calculating each attribute offset manually. If you ever want to go back and compress one of the attributes (you really shouldn't be using 32 bit signed indices!) you'll have to recalculate the offsets for every attribute that appears after the one you're compressing.
  13.   Why do the indices and weights overlap? Indices are 16 bytes starting at offset 24 bytes, but weights starts just 12 bytes later. You have an overlap between BoneIndices.w and Weights.x.
  14. Your Additive blend state looks fine (SrcBlend = SrcAlpha, DestBlend = 1.0). Alpha blended objects can only blend onto something if that "something" has already been rendered. So yes, the objects onto which you want to blend must already be rendered. This becomes difficult when you want to start blending alpha-blended objects onto other alpha-blended objects. As best you can you will want to render alpha blended objects from back to front so that the next alpha blended object draws on top of the ones behind it. Order-independent transparency is still an active area of research in computer graphics and is difficult to solve.