Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 07 Mar 2009
Offline Last Active Today, 03:41 PM

#5243345 [personal] one of my games passed greenlight, first time ever :)

Posted by Krypt0n on 29 July 2015 - 03:25 AM

otmf9.jpg wink.png


grats dude!

#5241848 Rendering Shader into Texture

Posted by Krypt0n on 21 July 2015 - 08:32 PM

are you running that one always?
	HR(m_pDevice->ColorFill(m_pTextureSurface, NULL, D3DCOLOR_ARGB(0xFF, 0, 0, 0)));
	HR(m_pDevice->StretchRect(m_pOffsceenSurface, NULL, m_pTextureSurface, NULL, D3DTEXF_LINEAR));
also, try 16bit/halfs and try point instead of linear (some hardware doesn't support it). and enable the D3D debug runtime, it will show warnings/errors.

and also try PIX or intels tool ( https://software.intel.com/en-us/gpa ) and step through the scene and check where reality diverges from your expectations ;)

#5241845 Lightmapping

Posted by Krypt0n on 21 July 2015 - 08:23 PM

to help out with a list
- last of us: http://miciwan.com/SIGGRAPH2013/Lighting%20Technology%20of%20The%20Last%20Of%20Us.pdf
- the witness http://the-witness.net/news/2010/03/graphics-tech-texture-parameterization/
- UE4 (not a game, but shows off nicely )
- Square Enix engine: http://www.jp.square-enix.com/info/library/pdf/An%20Implementation%20of%20Adaptive%20Tile%20Subdivision%20on%20the%20GPU%20(slide).pdf
- idTech 4 (and probably all other mega texture implementations)
- Geomeric's Enlighten (e.g. used in the frost bite engine): http://www.geomerics.com/case-studies/

#5241521 DXT3 nowadays

Posted by Krypt0n on 20 July 2015 - 08:44 AM

I've worked with artist that decided it on texture by texture basis.

if you look at just an individual pixel or block DXT5 might look superior, but two alpha blocks of DXT3 look consistent, while two alpha blocks of DXT5 vary in range and offsets for the palette, thus vary in quantitization/quality.

I can recall a particular cases where we had a 'health bars' created by artist (I think that was DX7) and I've used alpha-ref to animate between 0 and 100% of it. some of those had fancy shapes (like around the enemy's silhouette). it looked like jpg-blocks, it turned out to be because of DXT5, I thought we'd need to drop to RGBA8, but one artist said he by accident used DXT5, it should be DXT3 and really, it looked correct. (the health bar had some always transparent areas and also some always visible areas, those were bleeding into the actual health bar with DXT5 because of the adapting range when the shape was anything but a line. it actually was also bleeding in when it was a line, but consistent and therefore did not look like an artifact).

#5241516 A noisy bidirectional path tracing result

Posted by Krypt0n on 20 July 2015 - 08:24 AM

slightly different result with images generated from path tracing.

are you using the same tonemapping settings? looks like the noise of the bidirectional biased the result a bit.
you should filter the noise out of the image before you calculate the average (or histogram) for tone mapping.

very noisy result. the convergence rate is relatively low.

it's a rather path tracing friendly scene.
bi-directional path tracing tries to overcome the issue when finding the light source is hard. try to cover the light source somehow, hence the scene will be lit rather by secondary bounces than primary. In that case most path tracing paths will be aborted with no contribution, while those that survive will be contribution a lot to compensate. This should result in a lot of noise.

BDPT will won't suffer those issues that much.

The path tracing is not only faster, but also better from these result. I knew that's not true from a lot of materials, however I can't prove it.

path tracing is faster, that's valid (for the same amount of samples, less time), because it's less complex, obviously. But it starts to run into problems and drops massively in quality, that's when BDPT presumably should stay slower but also keep the quality.
of course, there is always a tiny chance that you have a bug somewhere smile.png

#5241514 1bit occlusion shadow maps for...

Posted by Krypt0n on 20 July 2015 - 08:10 AM

now I got it, sorry for my misunderstanding. my brain thought it was about culling.

I've actually done that (but for other reasons) and it was fine performance wise (but not faster) on a Radeon 7970 and Geforce GTX580, but the results were having bit/pixel errors. I had to use atomics to get correct results and then the performance dropped A LOT. when I got my ES of the GTX680, I've tried the test again, it was several times faster (I think it was 5x faster) than the GTX580, but still not faster than the plain old shadow rendering.

Tho, one improvement I've found was to simply set near/far of the shadowmap to the bounds of the view frustum and project the vertices of the shadow casters that are out of frustum to the near plane. Seems like that improved the zbuffer compression ratio and thus the speed (and also the quality/precision, of course).

nothing changes in shading code.

#5241144 Getting rid of platform-specific ifdefs

Posted by Krypt0n on 17 July 2015 - 08:37 PM

I extract platform specific code into separate .cpp files specialized per platform, but only those specific parts are extracted, not the whole functions, those stay common, thus you don't run into the complicating development problem that frob mentioned.

double Sys_FloatTime(void)
    static int64_t starttime = 0;
    if (! starttime) {
        starttime = PLATFORM_clock();

    return (PLATFORM_clock() - starttime) * PLATFORM_clock_scale_to_s();
usually I'd also move the type for time into a platform specific header and use a typedef instead of int64_t or int.

I'd guess 99% of the platform specific code is not time critical, thus doesn't need to be inlined or look pretty. Some, like e.g. SIMD code might actually need an
#ifdef __ARM__NEON__
#include "VecNEON.hpp"
#elif __INTEL_SSE__
#include "VecSSE.hpp"
inside a common "Vec.hpp", so that's just one very localized place.

And when you compare this to the big chunks of platform specific intrinsics that some programmers write, it's clean and versatile IMHO.

#5241140 Md5 Password Hasher would you use this.

Posted by Krypt0n on 17 July 2015 - 07:50 PM

I may just put this all on hold for now until the game is worth a grain of salt. I am sending just a plain old string as password. No challenge for a hacker.

It's really good that you try to have at least 'some' security, and while it's hard to make 'good' security, don't let the others discourage you from it and end up with 'no' security (for now). Using MD5 is already 1000x better than sending around plain strings. The problem is not only your game, but people tend to have tons of accounts and far less passwords. if you send plain text. whoever and however gets their hands on your list or just some network packets, can see the pwd to a dozen of accounts of your player. Maybe your game is not worth to be hacked now, but maybe the steam account that uses the same pwd is worth it.

(and YES, people use the same nick+pwd and we can blame them for stupidity, yet they all will blame you if their pwd gets stolen)

Thus, please don't transfer or store plain passwords!

if you're totally against investing time in this until it's worth it, then take something that is not a secret at all e.g. let the player enter his exact birthday + pick one fav color + count of fingers they have + .... + salt as login challenge. This still assures that
- nobody takes over randomly other accounts (as if it was if there was no pwd)
- your player won't use their default steam/gmail/paypal/... pwd
- if someone brute force hacks accounts or if someone hacks your server, nothing but your game will be affects (which you don't worry about for now)

#5239589 How Does Unreal Engine 4's Rendering Engine Stand Out

Posted by Krypt0n on 10 July 2015 - 02:03 PM

no offence, I'll just dump my opinion here ;)

I agree that mostly every graphics technique used in modern AAA game engines has been discovered by researchers and is present in published academic papers. However, when implementing something like physically based shading, the equations and algorithms can be got for papers but there are a lot of choices to be made on how to actually implement it.

those papers are not very academic, there are no hard to gasp formula. It's rather the other way around, those frequently include the source of the hole shading functions. I've seen a lot of people who used those implementations and got the wanted results and then asked for help to understand what they did, even artist who translated the hlsl code into kismet (or whatever the name of the UE3 visual shader graph ist).
(I'm not judging those people and it's not an attempt to imply anything but the papers put you in a place with barely any challenge).

This is where the engineering comes in. Its not just about implementing one feature efficiently but also integrating numerous features efficiently.

engineering is not just about making an implementation work in the environment, that's like the lowest mandatory requirement. Engineering is usually about finding a solution to a problem. "how do I implement this best" is another question and not really a problem.

When you look at what a rendering engine does, there are a lot of things that can go wrong. It is responsible for handling rigid and skeletal meshes, materials, skinning for skeletal animations, deferred and forward lighting, shadow mapping, ambient occlusion, frustum culling, occlusion culling, screen space particles, multi-threading, instancing, transparency, translucency and many different post processing effects such as blur.

your listing is indeed the vanilla "lets make an engine"-list. I don't know why anyone would implement such an engine if you can get exactly all that from unreal/unity/cryengine. it's like "we make our own cake, because we don't want the mass production bread from the supermarket"..."we got the pre-created baking mix from the supermarket and need to add water at the correct amount and temperature" .. that's not really baking.

e.g. there are a dozen of ways to do PBR. there are quite some trade offs you can choose from. e.g. you don't have to scarify anisotropic reflections. you don't have to pre-integrated based on the BRDF, you can also pre-integrate based on lights and a keep fully working brdf.

I've implemented my PBR car rendering before pixar made their paper. nearly nothing of my solutions matches those pixar uses (and you'd see our solutions shine in different moments). but I've seen quite some engines nowadays and down to spelling, parameter name and order etc etc it's the same plain thing.

This is really not cutting edge of engineering. There are sometimes exceptions, of course. like the tile based shading thingy the frostbite 2 had on SPUs on PS3, or the software rendering for "dreams" by media molecule, or the Ambient calculation of 'the last of us', or realtime photon mapping in natural selection 2. and it proofs that you can really make something different. No matter whether it's better or worse, but it's different.

and your question

what features has it that are very difficult to implement?

is the perfect proof for the current situation. it's not just you, a lot of engineers say "I've seen that in these engine, I will also add this to mine"
if you want to write your own engine, you should have a good reason. "Am I capable to clone UE4" is a rather sad reason. Ask yourself something like "what cool thing would I want to see in an engine that has never been done before?". Some people e.g. Notch/Minecraft made something that was rarely done previously and centered his game idea around it. does it have anything in common with UE4? clearly not. But that's what makes it unique and succesful.


These elements can be implemented one by one but implementing them optimally takes a huge amount of time. Time does seem to be the issue here and this is only the graphics engine. For that reason, I will probably use UE4 and modify it if necessary.

time saving is a good reason to use an existing engine if you want to have what is offers anyway.
but if it's true that you like tech, like you've stated in your entry post, then invent something. There are many things UE4 doesn't have, that no engine really has. you might have a lot of fun ;)

#5239292 How Does Unreal Engine 4's Rendering Engine Stand Out

Posted by Krypt0n on 09 July 2015 - 12:16 PM

rendering wise all modern engines are the same, because all the tech available is usually in papers way before the actual tech is available to you. In the end, the only real difference is the price. (for me, as a graphics engineer, that's quite a sad situation. Implementing papers does not require much problem solving skills, it's rather code-monkey work sad.png, anyway...)

if you want to make some a specialized engine, you can still start with the generic engine and modify it to the needs you have. It's common knowledge in engineering that 10% of the code is the critical one, 90% has just to be there. So it's more time effective for you to take the Unity/Unreal/Cry -Engine and just modify those 10% with 100% of your time, rather then wasting 90% of your time with your own custom engine that will anyway need (at least some of)
-console system
-event system
-resource handling

UnrealEngine follows those engineering principles too by rather integrating middleware than re-inventing the wheel. just important key elements are custom.

#5238430 How much performance improvement does SSE provide?

Posted by Krypt0n on 05 July 2015 - 08:11 AM

what makes you think AMD would not support SSE? I use codexl for most of my profiling and never encountered a missing instruction. beside that, profiling is mostly instruction independent. it counts where the instruction pointer is.

#5238343 How much performance improvement does SSE provide?

Posted by Krypt0n on 04 July 2015 - 07:56 AM

that's called "profiler". a free and good one is "AMD codexl".


#5238247 How much performance improvement does SSE provide?

Posted by Krypt0n on 03 July 2015 - 02:19 PM

some numbers from https://software.intel.com/en-us/articles/easy-simd-through-wrappers
x86 integer 379.389s  1.0x
SSE4        108.108s  3.5x
SSE4 x2      75.659s  4.8x
AVX2         51.490s  7.4x
AVX2 x2      36.014s 10.5x
SSE has more registers (while utilizing the usual registers on top), has special instructions (e.g. Min/Max, SAD, DotProduct) etc.
when you optimize really well, you can get way beyond 4x speed up. On the other side, in your first try, you will likely get a slow down, but don't get demotivated by it, next try you'll probably make it 2x faster already ;)

#5237439 If-else coding style

Posted by Krypt0n on 29 June 2015 - 02:23 AM

that's something that should be resolved with the reviewer instead of some guys in a forum.
otherwise you will never know for sure what the reason was and might repeat the error.

#5237118 SIMD and Compilation

Posted by Krypt0n on 27 June 2015 - 11:22 AM

Intrinsics are the only option when you want to extract 100% performance from the machine. Keep them in mind when you really really need to optimize a loop!
Assembly is pretty much unnecessary -- just use intrinsics.

lets say 90% without assembly ;)
while compilers are indeed very good at it, probably better than someone who's into SIMD for a year, those still have 2 problems.
1. compiler don't know the use case, thus they optimize in a generic way. for example, they don't know how critical one function vs another function is. the more critical should maybe be more aggressively optimized. if you optimize all functions to the maximum, you might cause the opposite e.g. due to unrolling be always instruction cache limited.
2. compilers have to assume the worst and guarantee correctness. you are most likely not aware of some weird case the compiler assumes that causes the compiler to drop a potential optimization.

but the simple solution to it, without _writing_ assembly, is simply to learn to read assembly. Read what the compiler has generated, sometimes a simple change of the order of parameters can save temporaries etc.

also, if you can get your hands on the intel compiler, you can make it spit out annotations to your code that will tell you what the compiler assumes and what optimizations it had to drop and if you're sure what you're doing, you can follow the hints to empower the compiler to optimize. Sometimes it's just as simple as adding pragmas or keywords (e.g. __restrict).

recently there was another SIMD optimization paper from Intel: https://software.intel.com/en-us/articles/easy-simd-through-wrappers