Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Jul 2001
Offline Last Active Today, 09:58 AM

#5295297 Overloading new

Posted by Promit on 06 June 2016 - 11:07 AM

Your vector<ptrdesc> is calling back into operator new on push_back. You'll need to supply it with a custom allocator, or use another container that does not itself depend on operator new.

#5295289 College? Life? Urggg!?

Posted by Promit on 06 June 2016 - 10:12 AM

As the others have said, computer science degree. More importantly, not a game development degree as offered by some schools. If you can find a real computer science degree with a concentration or special program or minor in game development, that would be ideal. I know they're available, but there aren't that many options. A regular computer science degree would do just fine, though.


In terms of actual details of what classes to take and all, worry about that once you're enrolled. It will depend on the specifics of the school. Generally it'll be best to focus on systems-level programming (computer architecture, operating systems, etc) rather than high level stuff (functional programming, big data processing, highly theoretical work, etc).

#5294833 How does Runge-Kutta 4 work in games

Posted by Promit on 03 June 2016 - 12:47 PM

I need to replace this with Runge-Kutta somehow. 

No, you don't.


I don't think that most physics engines use RK4 at all, most use semi-implicit euler for its balance of speed and stability. It's ok for simple stuff like mass/spring systems, but once you incorporate collision detection and response with RK4 there is not much benefit except in certain cases. For each substep you still need to test for collisions and respond so the performance is about 4x worse than 1st-order integration methods. The integration accuracy is better than just performing 4 1st order steps, but there are many headaches involved with using RK4 in a full physics simulation for games.

Yep. RK4, in short, is garbage. I always found it bizarre that Gaffer on Games recommends it by citing its accuracy, which isn't a particularly important integration property for a game. Stability (energy conservation/symplectic) is a vastly more useful property, and RK4 doesn't have it. In most cases, you can simply use semi-implicit euler and be on your way. Hell, it's faster performance to drive semi-implicit euler at higher frequency than to run RK4.

#5293669 Do you usually prefix your classes with the letter 'C' or something e...

Posted by Promit on 26 May 2016 - 02:55 PM

I've simplified to the essentials over the years. I prefix for interfaces, cause C# still rolls that way and I like it. (Non-pure abstract base classes are suffixed with Base, usually.) Single underscore to label _private or _protected class variables*. And g_ for globals because those should look gross. That's pretty much the extent of it.


* The underscore thing also works great in languages that don't HAVE private scoping, or when I don't actually private scope them but they're considered implementation details.

#5293616 .Net DX12

Posted by Promit on 26 May 2016 - 10:29 AM

That's the second time someone's asked this week. I haven't worked with SharpDX so I can't speak to that experience apart from where he used our code -_- But I felt that duplicating his work was not necessarily productive. If that's really not the case, and people really want a hand written SlimDX-based wrapper, then I'll see if I can pull something together.

#5293613 how to chose open_gl libary?

Posted by Promit on 26 May 2016 - 10:25 AM

You didn't really explain what you expect the library to do for you, especially if you want to learn about all this "from scratch". If that's your goal, I would truly do it from zero without any libraries. That said, I quite like SDL 2.x for handling windowing/input functionality across systems.

#5293608 what good are cores?

Posted by Promit on 26 May 2016 - 09:51 AM




Memory bandwidth is the bottleneck these days.

Bring on the triple channel! I was very upset when I learned that DDR3 implementations weren't supporting triple channel! I think it was only one or two intel boards that would. Of course you could always build a system using server hardware.
I was far more disappointed when I read several articles about how "we don't need triple channel memory". Well ya no shit we can't make good use of triple channel if it isn't available to develop on numb-nuts!


Quad channel on DDR4 shows next to no improvement, nevermind triple channel.


Why does it show no improvement?


Let's talk about that, actually.

Can the OS not facilitate operations on multiple memory channels in parallel?
Does the software showing no improvement not make use of multiple channels?

The OS cannot see the multiple channels, in fact. More on this in a moment.

It does seem to me though, that if you create a program that creates blocks of data on each channel it is a trivial act to utilize all four channels and achieve that maximum throughput.

How do you create blocks of data on each channel? I'll wait.


You have to remember, first and foremost, that any given program does not interact with the actual memory architecture of the system. Not ever. Let's work from the bottom up - a single stick of memory. No, wait, that's not the bottom. You have individual chips with internal rows and columns, themselves arranged into banks on the DIMM. Access times to memory are variable depending on access patterns within the stick!


But let's ignore the internals of a stick of RAM and naively call each of them a "channel". How do the channels appear to the OS kernel? Turns out they don't. The system memory controller assembles them into a flat address space ("physical" addressing) and gives that to the kernel to work with. Now a computer is not total chaos, and there is rhyme and reason to the mapping between physical address space and actual physical chips. Here's an example. There are no guarantees that this is consistent across any category of machines, of course. Also note that the mapping may not be zero based and please read the comments in that article regarding Ivy Bridge's handling of channel assignment.


Oh but wait, we're not actually interacting with any of that in development. All of our allocations happen in virtual address space. That mapping IS basically chaos. There's no ability to predict or control how that mapping will be set up. It's not even constant for any given address during the program's execution. You have no ability to gain any visibility into this mapping without a kernel mode driver or a side channel attack. 


Just a reminder that most programmers don't allocate virtual memory blocks either. We generally use malloc, which is yet another layer removed.


The answer to "how do you create blocks of data on each channel" is, of course, that you don't. Even the OS doesn't, and in fact it's likely to choose an allocation scheme that actively discourages multi-channel memory access. Why? Because it has a gigantic virtual<->physical memory table to manage, and keeping that table simple means faster memory allocations and less kernel overhead in allocation. It's been a while since I dug into the internals of modern day kernel allocators, but if you can store mappings for entire ranges of pages it saves a lot of memory versus having disparate entries for each and every memory page. Large block allocations are also likely to be freed as blocks, making free list management easier. Long story short, the natural implementation of an allocator leans towards creating contiguous blocks of memory. How do you deal with that as a CPU/memory controller designer? Based on the link above, you simply alternate by cache line. Or, you scramble the physical address map to individual DRAM banks and chips. Remember that Ivy Bridge channel assignment bit? Yep, that's what happened.


Frankly, the benefits of multi-channel memory probably show up almost exclusively in heavily multitasking situations that are heavy on memory bandwidth. I bet browsers love it :D

#5293391 what good are cores?

Posted by Promit on 25 May 2016 - 10:08 AM


Memory bandwidth is the bottleneck these days.


Bring on the triple channel! I was very upset when I learned that DDR3 implementations weren't supporting triple channel! I think it was only one or two intel boards that would. Of course you could always build a system using server hardware.


I was far more disappointed when I read several articles about how "we don't need triple channel memory". Well ya no shit we can't make good use of triple channel if it isn't available to develop on numb-nuts!


Triple channel is nonsense. It never showed up as beneficial to memory bandwidth outside synthetic benchmarks and very specialized uses. In any case, on the CPU side my personal feeling is that memory bandwidth isn't nearly as big a problem as latency, when it comes to games. It's chaotic accesses and cache misses that kill us. The GPU, on the other hand, can never have too much bandwidth. We're seeing some great new tech on that front with HBM(2) and GDDR5X.

Isn't 33ms still more responsive than 66ms?  :wink:

You also need to be aware that D3D/GL like to buffer an entire frame's worth of rendering commands, and only actually send them to the GPU at the end of the frame, which means the GPU is always 1 or more frames behind the CPU's timeline.

Of course, VR was where that really screwed us, much more so than input latency. That's why we wound up with this: https://developer.oculus.com/documentation/mobilesdk/latest/concepts/mobile-timewarp-overview/

#5292723 how much PC do you need to build a given game?

Posted by Promit on 20 May 2016 - 09:25 PM

Recommended specs are what happen at the end of the dev cycle, post-optimization work. During dev, a game requires much more power because it hasn't been optimized yet, and you may have any number of quick and dirty hacks to get things done. There are also productivity concerns - our game doesn't use a hexcore i7 effectively at all, but the build sure as hell does. 

#5290910 I aspire to be an app developer

Posted by Promit on 09 May 2016 - 07:01 PM

Moved to For Beginners.

#5288614 GPL wtf?

Posted by Promit on 25 April 2016 - 10:44 AM

It would be helpful if you supplied the original article, rather than your interpretation of it.

#5286316 Best laptop for game development?

Posted by Promit on 11 April 2016 - 10:21 AM

Both the Dell Inspiron 15 7000 series and Dell XPS 15 are excellent laptops. The Lenovo Y700 seems to be a great choice as well. In either case, I would opt for a dedicated GPU model if you can. I would not touch MSI again.


Of the laptops you listed just now... the T540 has a dedicated GPU so I would probably put that at the top of the list.

#5284557 When would you want to use Forward+ or Differed Rendering?

Posted by Promit on 31 March 2016 - 07:47 PM

Crudely speaking, the cost of rendering in forward is N objects * M lights. This means that heavily lit geometrically complex environments get very expensive. Deferred was developed because the cost of rendering for that approach is N objects + M lights. Lighting in deferred is very cheap, even thousands of them if they're small. I've used deferred pipelines in the past to run dense particle systems with lighting from every particle and stuff like that. The downside is massive bandwidth requirements, alpha blending problems, anti-aliasing problems, and material limitations.


Forward+ and its variations were developed to get the benefits of cheap lighting in deferred, but without all of the other problems deferred has. While bandwidth use is still pretty high, it tends to cooperate much better with more varied materials, alpha blending, and AA. It also leverages compute tasks for better utilization of GPU overall. In general, I would tend to encourage Forward+/tiled forward as the default rendering pipeline of choices on modern desktop/laptop hardware, unless you have a specific reason not to.

#5283934 What happens if gpu reads and cpu write simultaneously the same data ?

Posted by Promit on 28 March 2016 - 03:50 PM

I found the documentation on it: https://github.com/GPUOpen-LibrariesAndSDKs/LiquidVR/raw/master/doc/LiquidVR.pdf

It's actually being called Late Data Latch and the whitepaper has a brief explanation:


The Late Data Latch helps applications deal with this problem by continuously storing frequently updated data, such as, real-time head position and orientation matrices, in a fixed-sized constant buffer, organized as a ring buffer. Each new snapshot of data is stored in its own consecutive data slot. The data ring buffer has to be large enough to ensure the buffer will not be overrun and latched data instance will not be overwritten during the time it could be referenced by the GPU. For example, if data is updated every 2ms, a game rendering at 100fps should have more than 50 data slots in the data ring buffer. It is advised to have at least twice the minimum number of slots to avoid data corruption. Just before the data is to be consumed by the GPU, the index to the most up-to-date snapshot of data is latched. The shader could then index into the constant buffer containing the data to find the most recent matrices for rendering.

#5283741 OpenGL Check if VBO Upload is Complete

Posted by Promit on 27 March 2016 - 12:15 PM

apitest shows how to issue a fence and wait for it. The first time it checks if the fence has been signaled. The second time it tries again but flushing the queue since the driver may not have processed the copy yet (thus the GPU hasn't even started the copy, or whatever you're waiting for. If we don't flush, we'll be waiting forever. aka deadlock)
Of course if you want to just check if the copy has finished, and if not finished then do something else: you just need to do the 'wait' like the first time (i.e. without flushing), but using waiting period of zero (so that you don't wait, and get a boolean-like response like OP wants). We do this in Ogre to check for async transfer's status.

So can you use a fence to test for the actual status of a BufferSubData call uploading to server? And that works consistently across platforms without issuing a draw call against that buffer? After all the driver must do a full copy of the buffer to somewhere in-core at the point of call, but what the rube goldberg machine does after that is anyone's guess.



Calling glDraw* family of functions won't stall because it's also asynchronous. I can't think of a scenario where the API will stall because an upload isn't complete yet.

It'll stall if it gets caught in a situation where it can't continue dispatching frames without finishing a pending operation. I don't remember seeing this happen with buffer uploads, but I've seen it many, many times with shader recompiles. Whether this is because shader recompiles are a long goddamn process, or they have to happen in the main thread, or it just runs up against the limit of buffered frames, or some combination thereof, I'm not sure. It seems conceivable that a large upload followed by lots of dispatched frames could conceivably trigger the same effect.