• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.


  • Content count

  • Joined

  • Last visited

Community Reputation

622 Good

About maxest

  • Rank
    Advanced Member
  1. I would like to run some computation using compute shaders. A lot of computation. Since GPUs have separate memory engine I thought I could make use of it, just like with CUDA streams, and have asynchronous computation and data download GPU -> CPU. So I would do something like this: Dispatch 1 (first half of data) CopyResource 1 Dispatch 2 (second hald of data) CopyResource 2 Now the question is: will CopyResource 1 and Dispatch 2 overlap in time? I heard from someone that Discard causes a flush; it waits until all previous commands have been completed and then gets called but can't find that in MSDN. Can anyone confirm?
  2. I had no idea where to start this thread so here it is. I have barely 1 message in my inbox and yet when I want to compose a new message I get "Your inbox is full. You must delete some messages before you can send any more". A bug?
  3. I thought it should have been 0.1 ms as after refactoring the whole "system" I'm working on so that I need to only download 1 MB instead of 8 MB the total processing time went down by around 1.5 ms. Thank you again so much ajmiles.
  4. I actually did try placing End query right after CopyResource and before Map and that reported (as far as I remember, can't check now) something around 0.1 ms. Now I'm not really sure how should I measure the time it takes to download data from GPU to CPU. My CPU timer, when used to enclose CopyResource and Map, reported that downloading 11.5 GB took 1 second, what agrees with some CUDA-based test application for measuring PCI-E throughput that I used. When lowered down to 8 MB the download took 1.5 ms and when lowered to 1 MB the download took 1 ms. I'm not sure if PCI-E downloads should scale linearly as a function of data size but my tests show that they don't. At least that's what my CPU timer says. But the 0.1 ms reported by GPU timer when measuring CopyResource would indicate linear scale. Now I'm not sure if I should trust the CPU time reporting 1 ms (CopyResource + Map) or the GPU timer reporting 0.1 ms (just CopyResource).
  5. @ajmiles: Thank you so so much for this detailed explanation. I hadn't thought about GPU clock changing its speed. This makes more sense that performing some redundant work :). I have checked what you proposed. Got some simple DX12 sample, called SetStablePowerState and set it to true (needed to turn on Developer Mode on on my Windows 10; wasn't aware of its existence) and called permanent Sleep. Then I ran my application. Now regardless of whether I use VSync or not, call Sleep in my app or not, I get consistent 0.46 ms. It's more than without-VSync-and-SetStablePowerState 0.4 ms but at least it's stable. So as I understand the GPU is working at lower clock speed than it could (without Boost) but this speed is fixed. I have one more case whose results I don't entirely understand. I have code of this form: -- Begin CPU Profiler (with QueryPerformanceCounter etc.) -- Begin GPU Profile CopyResource (download from GPU to CPU) Map -- End GPU Profiler do something with mapped data Unmap -- End CPU Profiler The GPU profiler reports 5 ms whereas CPU reports 2-3 ms. If anything, should the CPU timer not report time bigger than GPU? I download around 1 MB of data. When I measure with CPU timer only CopyResource and Map I get around 1 ms. I would just like to ask one more, relevant thing. In my quest for search of reliable counters I stumbled upon this (https://msdn.microsoft.com/en-us/library/windows/desktop/ff476364(v=vs.85).aspx) but could find no simple example of usage. Is it working at all?
  6. I tested both. No difference. I thought about something along those lines but quickly came to a conclusion that it should not take place. I thought that everything should go and take as much time as in no-VSync case because it's the Present where the waiting happens; why would any redundant work happen in my actual computation time? I just checked how much time Present takes with VSync and indeed it's something around 15 ms, with some variance of course. So still it's a mystery to me why the computation code I profile would take more time in VSync mode. Wonder if that would also be the case under D3D12. EDIT: Encompassing the whole Render function with one disjoint ( http://reedbeta.com/blog/gpu-profiling-101/ ) actually works when VSync is off. I made wrong observation. It behvaes exactly the same as Begin/End of disjoint right before and after block we're profiling.
  7. I implemented DX queries after this blog post: https://mynameismjp.wordpress.com/2011/10/13/profiling-in-dx11-with-queries/ Queries work perfectly fine... for as long as I don't use VSync or any other form of Sleep. Why would that happe? I record queries right before my Compute/Dispatch code, record right after and then read the results (spinning on GetData if returns S_FALSE). When I don't VSync then my code takes consistent 0.39-0.4 ms. After turning VSync on it starts with something like 0.46 ms, after a second bumps up to 0.61 ms and a few seconds after I get something like 1.2 ms. I also used this source: http://reedbeta.com/blog/gpu-profiling-101/ The difference here is that the author uses the disjoint query for the whole Render() function instead of using one per particular measurement. When I implemented it this way the timings were incosistent (like above 0.46, 0.61, 1.2) regardless of VSync.
  8. Yeah, I'm perfectly aware of that workaround and I do it this way. But because I can't pass a shared memory array to function I can't make the function more general. Instead I need to copy it to a few files I use it in.
  9. Yeah, I have D3dcompiler_47.dll indeed. I did try. Forgot to mention that in previous post. The same problem persists.
  10. I stumbled upon those threads as well and it's not it. Also, I'm not really sure how to update my d3dcompiler. I'm using Windows 10 so I presume it gets updated automatically. Although I use Visual Studio 2013 so I cannot really be sure if the most up-to-date dll is used. I found out that the problem appears even in this code: static const int ElementsCount = 512; groupshared uint tempData[2 * ElementsCount]; void MyFunc(inout uint3 gtID: SV_GroupThreadID, inout uint inputData[2 * ElementsCount]) { } [numthreads(ElementsCount, 1, 1)] void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID) {     MyFunc(gtID, tempData); } Note that I don't even write anything to tempData in MyFunc. I also found out the problem goes away if I remove the "inout" modifier but then the array just gets copied probably as the code doesn't work as expected.
  11. I have code like this: groupshared uint tempData[ElementsCount]; [numthreads(ElementsCount/2, 1, 1)] void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID) { tempData[gtID.x] = 0; } And it works fine. Now I change it to this: void MyFunc(inout uint3 gtID: SV_GroupThreadID, inout uint inputData[ElementsCount]) { inputData[gtID.x] = 0; } groupshared uint tempData[ElementsCount]; [numthreads(ElementsCount/2, 1, 1)] void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID) { MyFunc(gtID, tempData); } and I get "error X3695: race condition writing to shared memory detected, consider making this write conditional.". Any way to go around this?
  12. As a person who doesn't like 3rd party engines and at the same time a person who has worked with one (Unity) for two years professionally, I think I can make some contribution to this thread. As I mentioned, I don't like 3rd party engines. I like to write my own code. After I had worked on some private projects, including writing my own engine (and eventually releasing a Steam game on it) in C++, and other from the ground-up stuff, I joined a company where they used Unity. It was the first time I think when I was using "somebody else's engine" and I really liked it. Unity is well structured, has a couple of simple building blocks (game objects and components) and you can build your way up towards a game any way you want. They did a great job in separating the engine (what Unity is) and the gameplay (which you write yourself). And because of that various people around the world build their tools and share them with the world (AssetStore) thus proving Unity's modularity. It took me like a week of work with Unity to really find myself a way around it. I felt like home. Of course, Unity has its weak spots, sometimes ones you would not except to find in a piece of software like this, but still when I'm in a need to use a 3rd party engine, I go for Unity. As for Unreal... I had a really hard time figuring out how to work with Unreal 3. So much that I gave up. Then I tried Unreal 4. I nearly ended ditching it also altogether like I had ditched Unreal 3 because it turned out that one simple task I wanted to accomplish was almost impossible to do. I wanted to have an animated skeletal mesh and set one of the bones to my own local-to-world matrix. I tried to do it in C++. I googled for it. Turned out there are many people asking about it and nobody gives clear simple answer to that. Because there is none! Turned out that I could "relatively easy" to what I wanted but only via a BluePrint. From that on my adventure through Unreal 4 was a bit easier but still I stumbled upon a lot of hurdles. With Unreal it seemed like I constantly had to google how to do this or that. There is a lot of complex building blocks in there that you can use but getting to know them takes time. Also, Unreal 4 tries to pretend it has a similar to Unity gameobject/component model. That is only partially true. For instance, in Unity you can have a hierarchy of game objects and each game object has its own set of components. That's it. In Unreal on the other hand components can have hierarchies what was very misleading for me. Another thing was when I created a simple template project. I have the list of objects in the scene, I more or less see/know what they are. Then I run the project and suddenly I have twice more game objects in the scene out of thin air. Objects like "GameMode" or other intrinsically related to Unreal's beloved Gameplay Framework. Yeah, the Gameplay Framework itself is something I didn't like. The doc says that Gameplay Framework is just a set of useful classes built on top of the engine that can help you making your game (you know, the old PlayerController or Pawn from Unreal 3). Well, if Gameplay Framework was really such a nice framework built on top of the engine and not part of it I don't think there would be references to its classes all over the engine's options menus. There are even some C++ structs related to PlayerController in UnrealEngine.h file. To sum up: Unity is well-structured to me and the fact the source code is not published does not bother me much as the engine itself is elegantly separted from gameplay which you write yourself. A couple of building blocks you can arrange the way you want. Unreal on the other hand seems very cluttered with a lot of various objects/classes, interrelated to one another. I can imagine you can do more optimized stuff in Unreal 4 than in Unity but you will end up with code maintenance nightmare.
  13. Nothing. It's something like this (download):         uint64 bef = TickCount();           deviceContext->CopyResource(stagingCopy.texture, gbufferDiffuseRT.texture);         D3D11_MAPPED_SUBRESOURCE mappedSubresource;         deviceContext->Map(stagingCopy.texture, 0, D3D11_MAP_READ, 0, &mappedSubresource);         memcpy(mydata, mappedSubresource.pData, sizeof(mydata));         deviceContext->Unmap(stagingCopy.texture, 0);         uint64 aft = TickCount();         cout << aft - bef << endl;   As for my home GeForce 660 GTX I've just checked in HWINFO app that it's plugged into PCI-E 2.0, hence the slower speed than at my work computer. Nevertheless I presume the 8 GB/s and 3 GB/s should be bigger. And identical.
  14. I'm now testing my work computer which is brand new with GeForce 1080 GTX. See detailed spec in this picture: https://postimg.org/image/hwhuntpn5/   Now my tests show upload (CPU->GPU) 8 GB/s and download (GPU->CPU) 3 GB/s. PCI-E is bidirectional and all sources I've found claim the transfer rate in both directions should be identical, what is not true in my case.
  15. I'm just looking at potential uses. I'm aware the GPU->CPU traffic should be avoided as much as possible but for some tests I needed to do this and to make those tests reliable I wanted to utilize full transfer potential.   On a side note, uploading data (CPU -> GPU) takes 3-5 ms (around twice faster than the other way around).