Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 14 Feb 2007
Online Last Active Today, 05:04 AM

#5224141 Investing into a Game Project

Posted by Hodgman on Yesterday, 03:39 AM

$30k is about 30 man-weeks of wages for a cheap developer, or 10 man-weeks for a more experienced one.

That could be enough to make a very small game, as long as you keep it very simple amd run the project well...

Typical small mobile games are probably more like 80 man weeks of work, or "indie games" could be hundreds. Typical PC/console games would probably be at least 2000 on the cheap side.


If you yourself were a developer, you could probably use $30k to pay basic living expenses for quite a while - maybe a year in an expensive western city, much longer in cheaper areas. If you're highly experienced, then this is much better value than hiring someone else for just 2 months.


You could co-invest in a larger project. e.g. if a small indie team has already spent $150k but needs a bit more, then you'd bring them up to $180k and be able to buy 1/6th of the company.

#5223688 Is C++ RTTI good or bad?

Posted by Hodgman on 16 April 2015 - 09:01 AM

Most tutorials say that you should always prefer RTTI casting syntax to C-style

"RTTI casting" isn't a thing... Well, i guess dynamic_cast could be called an "RTTI cast", but i dont know of any tutorials that tell you to default to dynamic_cast...

Static_cast/const_cast/reinterpret_cast are definitely arguably better than C style casts, but that's a different question altogether.

Or is it just a minor performance issue?

If you have an algorithm that relies on RTTI/dynamic_cast, then it can be a major performance bottleneck. Some compilers implement RTTI type comparisons as a tree-of-strings vs tree-of-strings comparison, which is horrendously slow.

Even needing to use these features in the first place is a "code smell" though - a hint that your algorithm/code is ill thought out. There's usually better solutions that dojt involve using RTTI.

When RTTI is truely required, most libraries will implement their own simpler/faster version of it, rather than use the horrible standard one.

#5223576 Array realloc size

Posted by Hodgman on 15 April 2015 - 07:38 PM

When you're making games, you can generally work out your correct capacities ahead of time, allocate that amount once, and then never reallocate.

#5223434 Why high level languages are slow

Posted by Hodgman on 15 April 2015 - 08:50 AM

First, the elephant in the room: C++ is a high-level language. (Shock! Horror! Let's stop using it!) As is C. As is Pascal. As is COBOL. And the list goes on...

If someone implies that C++ isn't high level, but C# is... then chances are they're using a different definition than you are... so instead of shouting "WRONG!" at them, it's better to assume good faith and adopt their terminology for the purpose of understanding their point.

In school we first learnt that you had a hierarchy consisting of: machine language, assembly languages, high level languages, and domain specific languages.

Later we learnt to categorize them, taking classes in systems programming in C, or application programming in Java. You learn that Java is more high leveler than C...

It's pretty obvious he's comparing C#/Java/Python-esque languages to C/FORTRAN/Pascal-esque languages.

He may have part of a point here simply because C# gives you less control then C++ - but that's kind of the point. You're letting the runtime determine more things for you because you think you can afford the potential cost as a tradeoff for development time and lack of bugs.

Just saying that productivity trumps performance is changing the subject, is it not? In a discussion about how language design affects performance, the impacts of your design choices in other areas are interesting, but not the topic at hand.

It's possible to come up with choices that let you have your productivity-cake and eat your manual-cache-performance too, like C#'s StructLayoutAttribute, but that then enters the argument about having to fight the language and use non-idiomatic means to regain said cake eating.

Can you write faster code in C++? Yes. If you have a few extra years.
You can also write faster code in assembler, but you don't see people screaming about how inefficient C++ is and telling everyone to use that, now do you?

It's not that simple.

C++ is a less productive (harder) language, so an inexperienced C# team might take longer to produce a product if forced to use C++, and their product may also be slower and buggier, than what they would've produced in C#.

Meanwhile, a team of trained C++ veterans might produce a product in C++ much faster than they would in C#, and also win on performance.


Just because C++ has lots of performance tools at your disposal, it doesn't mean that people use them correctly. Most people don't. But they're still useful tools for craftsman to have at their disposal.


As for assembler, when doing systems/embedded programming, usually you have a very good idea of what the resulting assembly will look like. You basically ARE writing the program in assembly, using C/C++ as your assembler tongue.png If the compiled code doesn't line up with your expectations, then you tweak the your code so that the compiler does produce what you expected of it. It's low-level thinking in a high-level language.


Moreover, the compiler is usually better at asm than us, so it takes our intentions and cleverly optimizes them better than we ever could, using it's super-human knowledge of instruction scheduling, dependency chain tracking, register juggling abilities, etc...

Compilers are getting smarter all the time - and a higher level language is generally better at expressing programmer intent which allows the compiler to make optimizations it might otherwise have been unable to do.

My bolded statement above does not apply to the behavior of C#/Java/C++ code when it comes to smart data layout. C/C++ sucks at this too!

None of these languages have the power to automatically fix structural problems... which is kind of the point. There is no magic "fix my cache misses" algorithm.

It comes down to the language having tools to better allow you to describe the structure of the data.

Things like ispc's soa keyword, or the scope-stack allocation pattern, or Jai's "joint allocations", are, unfortunately, new language developments, but are ideas that have been developed manually forever (in languages that allow you to do such things manually, such as C)...


As for memory management, the dichotomy of a C-style manual free-for-all vs a C#-style catch-all GC is false. There's so, so many ways to manage resources, but not that many with language support.

It's also a two-way street. I've shipped lots of games that use GC's, and had to deal with all the show-stopping bugs caused by the GC that delayed shipping dates.

e.g. the GC is consuming too much CPU time per frame, blowing the performance budget. Ok, do a quick fix by capping how long it can run per frame, easy! The memory usage is now growing steadily until we run out and crash... Fix properly it by fighting the language, doing a pass over the ENTIRE code-base, and micro-optimizing everywhere to eliminate temporary objects.


A lot games I've worked on have also successfully used stack-based allocations entirely (no malloc/new/gc)... There's endless options here, 99% of which have no language support. The GC pattern isn't the one true solution to rule them all. The point is that high(er) level languages have already made more choices for you, deciding which tools you're allowed to use, and which ones of them are running at full voltage.

The simpler a language is, the easier it is to take a different evolutionary path through the tree of data management ideas. New languages will hopefully be able give us the productivity gains of C#/Java, but the efficiency that we currently can only achieve by doing manual data juggling in C... This advancement will be similar to the advancement from asm-programming to C-programming, but for structure this time, not logic biggrin.png

Also he calls out the .Net Native stuff and points out that while it'll help instructions it won't help with layout of memory and memory latency is a horrible problem which isn't getting better.
In fact, it's getting worse.

To use made up numbers - CPU performance doubles every 2 years, but memory performance doubles every 10. This means that after 10 years, your CPU is 32x faster, your RAM is 10x faster. If you normalize that for CPU power, what's happened is that your RAM has gotten ~70% slower!

Every year, instructions-per-memory-fetch just goes up and up. C#/Java were born out of the era of 90's programming when this issue wasn't as much of a big deal. This decade it's almost the defining problem of software performance... and none of our languages are built to deal with it.

#5223269 The best place to find research papers

Posted by Hodgman on 14 April 2015 - 05:55 PM

Look at the bibliography slides on interesting conference presentations - e.g. if it's contributed to a GDC or SIGGRAPH talk that you like, that probably counts as being "highly rated".

#5223131 Phong reflection on a water surface.

Posted by Hodgman on 14 April 2015 - 06:25 AM

Phong shading produces perfectly round highlights. You probably want to use Blinn-Phong shading, which correctly stretches highlights on rough surfaces at glancing angles (producing the characteristic streaky reflections seen on water or wet roads).

#5223107 Game development - Software Engineering or CS?

Posted by Hodgman on 14 April 2015 - 02:49 AM

The wayyyy too simplified summary - CS has more theory, SE has more practice. Both are useful.

Whichever one you choose, you should learn the other one in your own time.
e.g. if you have friends in CS learning something you're not -say, ordinary differential equations - then ask them to give you a high level explanation of what they are. They might appreciate the opportunity to solidify their own study, and you get a heads up that converts an unknown-unknown in your knowledge into a known-unknown, which you can persue later.

The most important thing is that when you finish your degree that you also have a personal portfolio to show off, made up of your extra-curricular / hobby projects (not your coursework).

i.e. whether you get to take a GL class or not, the appearance of a GL project in your portfolio is entirely down to how you spend your free time. The class might just give you a small head-start (maybe - assuming the course isn't horribly outdated).

Personally, I highly recommend modding (total conversions, etc) as a personal project, as it allows you to learn and demonstrate many aspects of game dev in a short number of part-time years.

Personally, I did an SE major, but took as many CS electives as I could.
I also got hired off of my modding work before I'd even graduated :D

#5222841 Direct3D 12 documentation is now public

Posted by Hodgman on 12 April 2015 - 07:39 PM

Regarding upload heap and mapping buffer, it looks like it's possible to never unmap pointer. How does D3D12 manage cpu cache then ?
I though that unmapping (and glMemoryBarrier(...) with glBufferStorage) was flushing the cpu cache so that buffer in main memory has the correct content.

Typically mapped memory is uncached, and so writes will bypass the cache completely. For these cases write combining is used to batch memory accesses.

You still need to issue a CPU instruction that will flush the write combining buffer.
It's very unlikely to see this bug occur, as this buffer will almost certainly flush itself out before next frame anyway... but on my splash screens on the (new) console APIs, I had graphics corruption that was fixed by putting a CPU write fence instruction right after the code that streamed textures into mapped (uncached, write-combined) memory. In that simple situation, I'd managed to create a scenario where the GPU wad reading the texture data with such low latency, and the CPU was being so lazy in evicting pixels from the write-combine buffer, that the splash screens were missing blocks of colour for a few frames.

If there's no D3D12 function for this, maybe you're just supposed to be aware of the HW memory model and flush it yourself like I did?? If so, the easiest way to do it on x86(-64) is to use any instruction with the LOCK prefix, which on Microsoft's compiler pretty much means to use any of the Interlocked* family of functions, or the MemoryBarrier macro.
Alternatively, you could use the now standard C++ atomic_thread_fence(memory_order_release) function. That sounds much more sane, actually...


Actually, on x86, atomic_thread_fence(memory_order_release) doesn't actually generate any instructions (it's only a compile-time directive)... so you'd atually need to use atomic_thread_fence(memory_order_seq_cst)... even though that's a stronger fence in theory than what is needed sad.png

#5222688 Efficient 24/32-bit sRGB to linear float image conversion on CPU

Posted by Hodgman on 11 April 2015 - 08:29 PM

It involves a gamma curve, but isn't a gamma curve - pow2.2 is a good approximation, but for accuracy it's important to use the real formula with the linear tail.

I'd implement your look-up-table version, and a plain ALU version and profile them in a real usage situation. The LUT version's performance will depend heavily on how much pressure is on the cache.

For the ALU version you can do both sides of the discontinuity and then select the correct side branchlessly.
a = srgb*(1/12.92);
b = pow((srgb+0.055)*(1/1.055),2.4);
rgb = srgb ≤ 0.04045 ? a : b;

^ That final ternary statement can be implemented with conditional moves/shuffles, masking and adding (ANDing and ORing), etc...

...but the pow is costly, so maybe you do want to use a real branch if any elements in the vec4/vec8 need it.

n.b. to SSEize the pow, you can use exp/log instead:
b = exp(log((srgb+0.055)*(1/1.055))*2.4);
...and get an exp/log implementation from a library like http://gruntthepeon.free.fr/ssemath/



To write this kind of SIMD code, I've recently been using the ISPC language, which lets you write the algorithm once and then compile it to SSE2/AVX/AVX2/etc... Gathers/scatters will be emulated on the older instruction sets though, of course.

#5222611 Is this C code using pointers valid?

Posted by Hodgman on 11 April 2015 - 10:15 AM

			current = (serverInfo*)realloc((*pServers), sizeof(serverInfo) * (serverCount+1));
			if (current)
				pServers = &current;
				return -1;

This part is broken.... I think (the logic is confusing).
You reallocate the input data, so it seems that you want to pass that new allocation back out to the caller. If so, it should be:
(*pServers) = (serverInfo*)realloc((*pServers), sizeof(serverInfo) * (serverCount+1));
Anywhere where you're taking the address of a local variable should be cause for great inspection of the logic smile.png

#5222535 Direct3D 12 documentation is now public

Posted by Hodgman on 10 April 2015 - 06:58 PM

  • Read depth buffer back as SRV working (only on warp, crash on hardware)

I think you need to copy depth buffer before reading it. Depth buffer is stored in a compressed format that can't be sampled afaik, drivers usually decompress it.
Hopefully that doesn't require a copy... At the driver level it involves copying some data from the HiZ buffer and some registers into some parts of the depth buffer, based on a compression/clear mask. Having to copy the whole depth buffer would be wasteful.
Same goes for colour textures too actually - they need to be decompressed after being used as an RTV but beforr being used as an SRV.
AFAIK, this is what the resource transitions/barriers are for.

#5222337 A practical real time radiosity for dynamic scene

Posted by Hodgman on 09 April 2015 - 06:45 PM

Yeah a traditional radiosity implementation does not work very well in realtime. If you're interested, this presentation explained how Geomerics managed to implment it in realtime for their Enlighten middleware:
A Real-Time Radiosity Architecture for Video Games - Geomerics

If your scene is mostly outdoors, the light propagation volumes technique I mentioned above might works well.

You could also look at deferred irradiance volumes. These rely on some precomputation as well - a gbuffer cube map per probe, but you coyod re-render these whenever the scene changes, or perform round-robin updates where you only re-render one cubemap each frame.

The RSM technique dynamically places thousands of small lights to simulated bounced light. If you've got an efficient deferred renderer, you could try that too.

#5222324 A practical real time radiosity for dynamic scene

Posted by Hodgman on 09 April 2015 - 04:55 PM

I've been recently searching for methods to implement a real time radiosity but the only method I found was global illumination which didn't seem to be that practical or even implementable (using opengl) . (as the topic says I need radiosity for a whole dynamic scene)

Global Illumination isn't a method / algorithm -- it's a category for any algorithm that includes both direct and indirect lighting.
Which specific GI algorithms have you looked at?

On the other hand Radiosity IS a GI algorithm, where you divide your surface up into "patches" (e.g. the texels of a lightmap), calculate which patches are visible from each other patch, apppy direct lighting to the patches, and then use the visibility data to propagate/bounce indirect lighting from patch to patch.

Realtime versions of this algorithm usually precompute the patch visibility data, meaning it's only useful for static scenes... So you're probably not after Radiosity, but a different kind of GI algorithm.

There's a lot of dynamic GI techniques - instant radiosity, virtual point lights, reflective shadow maps, imperfect shadow maps, cascaded light propagation volumes, dynamic light probes, etc...
Maybe tell us some more about your scenes, lights, current direct lighting techniques, and target hardware, and we can suggest something that fits.

#5222172 Content pipeline

Posted by Hodgman on 08 April 2015 - 10:53 PM

You're comparing a language with a library.


XNA is a library of C# code, which has implemented lots of features, such as these content pipelines.


C# by itself has no concept of a content pipeline, just as C++ doesn't either.

If you want to implement this concept, look at the source code for MonoGame (an open-source XNA replacement) to see how they work, and then write that code again in C++.

#5222152 Storing bits in a container

Posted by Hodgman on 08 April 2015 - 07:26 PM

I have been suggested several times not to use vector and to use std::bitset. But std::bitset is fixed size container and Huffman encodings can vary in length. So, I'm using vector.

std::vector<bool> is specialized by the standard library - it doesn't act the same way as any other std::vector<T>, it acts like a variable sized bitset, using compact bit storage! So you're doing the right thing™ here.