Jump to content
Site Stability Read more... ×
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Posts posted by Necrolis

  1. In a slightly more generalized answer: If you can attach PIX then that will help track down any specific DX API call used to perform an operation or render (as it gives you a breakdown of DX calls in a frame along with a before and after of the state and rendered frame -- in the frame capture mode). It will also help in debugging any issues you might have (though unfortunately MS saw fix to break PIX with a certain Win7 patch...). The various GPU vendors also provide similar free tools (NSight for nVidia, GPUPerfStudio for AMD and GPA for Intel).


    The game might possibly call D3DPERF_SetOptions() to disable PIX, but its very easy to NOP/remove.

  2. Did they dropped their dynamic octree GI thingy? Or it wasn't dynamic to begin with?

    They are releasing the Elemental Demo with UE4.1 (for free, see here), however I still have not been able to figure out if it contains the SVOGI impl., or if its been converted to LightMass or the LPV tech they settled with for dynamic GI (the Fable: Legends blog had a post on this in the past week).


    They use some hard-coded code style I've not seen before to build the UI 'forms' which looks kind of like nested method calls followed by multiple nested arrays but each section starts of with a '+'. From memory I can't remember the exact syntax but it looks quite odd to me. Perhaps it's something new in the latest version of c++?


    Its actually a metric-ton of operator overloading done on multiple classes from what I can tell, its really weird to look at, but makes sense in a visual way as it more closely matches the nesting hierarchy of the elements.


    A question I'd like to have answered is whether or not UE4 is set up to make it relatively easy to implement your own lighting models, or if you would have to go ripping deep into the source code to get that to work. Also, is it all node based, or can you use HLSL/GLSL at all?


    The shaders are very well structured and nicely commented, so it should be a (relatively) simple task to add in new models (with the way they segregate a lot of it; it should be easier than from scratch as you get to reuse a lot of the common infrastructure), same goes for the C++ code as well. same goes for customizing the BDRF (which already has quite a lot of options built in).


    How does your pRamp/GenGammaRamp stuff get used? The D3DGAMMARAMP is based around WORD values, not doubles, plus you seem to be generating an array of 255 values instead of 256.

    It gets mapped back to a WORD before being used to filled the D3DGAMMARAMP (the reason for the size mapping is cause there is also a palette-based software renderer, but it ignores the gamma ramp -.-), as for the off by one error, thats probably my fault along the line somewhere, so thanks for catching that smile.png


    As for how it gets mapped: its literally casted to a WORD as the range mapping is done inside GenGammaTable (I folded in the range value, fMaxGamma, cause I'm only concerned with the D3D gamma, originally this was a parameter):

    double GammaTable[256];
    D3DGAMMARAMP ramp;
    for(int i = 0; i < 256; i++)
        WORD Gamma = (WORD)GammaTable[i];
        ramp.red[i] = ramp.green[i] = ramp.blue[i] = Gamma; 


    If you wanted to do this in a shader, you'd take the array of 256 'gamma' values, and store them in a 256px * 1px texture (you could use D3DFMT_R32F and store them as floats in the 0-1 range). You'd then use a post-processing shader like this:

    float3 coords = myColor.rgb;//treat the colour as a texture coordinate
    coords = coords * 0.99609375 + 0.00390625;
    //^^^ this is necessary to map 0.0 to the center of the left texel, and 1.0 to the center of the right texel
    //sample the texture 3 times to convert each channel to the value in the gamma ramp:
    myColor.r = tex2D( theTexture, float2(coords.r, 0) );
    myColor.g = tex2D( theTexture, float2(coords.g, 0) );
    myColor.b = tex2D( theTexture, float2(coords.b, 0) );


    Ah so it is just a straight off "scaled-index", originally I had tried using "Out.Color = pow(In.Color,gGamma/2.2)" but I had no clue how to add in the contrast, this also washed out the colors very quickly as apposed to the original ramp.


    I'm already using 1D LUT's to emulate 8-bit palettized color, so technically I should be able to remap the palette LUT to account for the gamma If I understand this correctly; though I think its probably best to first have it working with the double LUT. Your note about the texel centering reminds me that I didn't do this for my palette LUTs, so that fixes something else as well smile.png

  5. More seriously, it's funny to see all the unreal engine using #pragma once.


    Its actually good, means faster processing than include guards (which they also use here an there as well, but make they the mistake of using a double underscore prefix, which to be pedantic, you shouldn't ever do).


    Spelunking around the UE4 source is quite interesting to say the least, especially some of the nifty "don't do this cause the debug layer explodes" comments.

  6. I've been trying to figure out a way to map a gamma ramp generation function I have (obtained through RE) to an HLSL/GLSL function/operator, in an effort to emulate the "look and feel" of an older game I fiddle with in my spare time. 


    however, I'm failing to get anywhere because I'm not sure how the gamma ramp set by D3DDevice9::SetGammaRamp gets used when outputting a pixel. what I'm looking for is: if I have the RGB tuple "x" what operations are performed on x's channels using the ramp that yield the final pixel rendered to the back buffer?


    The ramp generation looks like so if it helps in any way:

    void GenGammaRamp(long dwGamma, double fContrast, double* pRamp)
        double fGamma = (double)dwGamma;
        double fFractGamma = 0.01 * fGamma;
        double fGammaPercent = 100.0 / fGamma;
        double fRampEnd = 255.0;
        double fMaxGamma = 65535.0;
        double fGammaFactor = 1.0 / fRampEnd;
        for(double fRamp = 0.0; fRamp < fRampEnd; fRamp += 1.0)
            double fGammaStep = fGammaFactor * fRamp * fMaxGamma * fFractGamma;
            fGammaStep = fGammaStep > fMaxGamma ? fMaxGamma : fGammaStep;
            double fFinalGamma = (pow(fGammaFactor * fRamp,fGammaPercent) * fMaxGamma * (100.0 - fContrast) + fGammaStep * fContrast) * 0.01;
            pRamp[(int)fRamp] = fFinalGamma;

    (the values get converted to back to 8/16/32 bit integers just before the are sent off to the driver).

  7. I really like the fact that they decided to open source this on GitHub (seems the repo isn't public yet, even though their site claims so...), I love spelunking through AAA engines; the tutorials look pretty great as well.


    What I can't understand is if/where you are able to download the UE4 UDK without paying the fee (as the registration says you can continue to use it even with a cancelled sub, you just won't get updates), ie: if I just want to bugger around and don't plan on releasing anything, am I still due for a "once-off" $20 payment?.



    EDIT: I think I get the github thing now, seems you need to register through the UE portal, then link your existing github account to the UE portal, for some reason I thought the page was showing people how to sigh up to github...

  8. Without seeing what you are doing, the best advice is to just point you to some "best practice" stuff, here are the slides to a great talk from nvidia at steam dev days on speeding up your opengl code (the video is on youtube if you want the audio guide). In particular, pay attention to the buffer management portion and probably the draw indirect stuff.

  9. Your reflect vector looks odd to me (but then again the one I used is inverted from all the documentation I came across, which correctly has r = 2(n dot l)n - l), this is the code I used in one of my ray tracers for diffuse + specular, maybe it gives you some hints: 

    if(!World->GetGeometry()->Occlude(Ray(i + l * 0.001f,l),light_dist))
    	float nl = n.dot(l);
    	if(nl > 0)
    		c += lc * vColor * pMaterial->GetDiffuse() * nl;
    	Vector r = l - (n * 2 * nl);
    	float rv = ray.GetDirection().dot(r);
    	if(rv > 0)
    		c += lc * vColor * pMaterial->GetSpecular() * std::powf(rv,pMaterial->GetSpecularPower());

  10. From the material layering video from the inside UE4 series, it seems that they use a masking texture and blend each material all in one pass using each of the four channels of the mask as the blend weight (though, I get the impression that some of it may be done "offline"/at startup and baked once into a final composite material). 


    You might also find MJP's slides on Ready At Dawn's material layering/compositing system of interest as well.

  11. That code basically compacts to:

    ((x + 0xF) >> 4) << 4

    ...which is strange -- doing a shift down followed by a shift up immediately afterwards.

    If I'm not mistaken, that's the same as:

    ((x + 0xF) & ~0xFU)

    i.e. clear the lower 4 bits, while rounding upwards to the nearest multiple of 16.

    Given this, and not much context, I would guess that the author is using a 28.4 fixed point format, and this is an implementation of the ceil function?

    Shifts preserve the sign bit if I'm not mistaken


    EDIT: hmm, on second though, I didn't read your code correctly, but then again, shifts would be "safe" standard wise, as opposed to the bit masking on an integer.

  12. If you can use LuaJIT, you might find it easier to write C wrappers for all the class functions and invoke them through the FFI.


    If want to avoid all the C wrappers, the FFI can bind directly to the class members (see this), its a bit brittle though, however, if you are targeting only a single compiler+platform, its trivial* to generate a script to encode the symbols and bind everything in the FFI. Not only is that faster (dev wise), and leads to less clutter, but you get LuaJIT with all its awesome features :) (if you factor your code properly, you can even just parse a C or C++ header directly in Lua, meaning you only need to update a singular point).


    I went the C++ template and macro magic route (not as heavily as Hodgman), found it horrible to work with (and by "work with" I just mean adding of new functions, enums, classes/objects etc) in the end, been meaning to try the above using the FFI (I use LuaJIT already), unfortunately I haven't had the time yet...



    *the cdef generation part, decoration of symbols might be a little more tricky, esp. with MSVC 

  13. I've made a few ray tracers in the past, but they where normally focussed on small aspects, taking shortcuts for many things, however recently I needed to make one for a uni project and decided to go all out. So after setting up my math libs, getting all the ground work into place (which most importantly includes a 4DOF FPS camera) it came time to trace a few spheres to test both the ray casting and get ready to implement a BVH/BSP scheme.


    First thing I tried was this snippet of code (take from this wonderful set of articles):

    Vector dst = vCenter - r.GetOrigin();
    float b = dst.dot(r.GetDirection());
    float c = b * b - dst.dot(dst) + fRadiusSquared;
    if(c < 0)
    	return INTERSECT_NONE;
    float d = std::sqrtf(c), r1 = b - d, r2 = b + d;
    if(r2 < 0)
    	return INTERSECT_NONE;
    if(r1 < 0)
    	fDistance = r2;
    fDistance = r1;

    it works, but far off objects come out very "fuzzy" and jagged, clearing up as you get closer:



    After trying quite a few variants (such as this) which all suffered the same problem, I suspected a problem in my maths code; however, using the following code (adapted from here, based on an unoptimized quadratic formula, massively slows down rendering):

    	//Compute A, B and C coefficients
    	Vector o = r.GetOrigin() - vCenter;
    	float c = o.length_squared_fast() - fRadiusSquared;
    	if(c < 0)
    		return INTERSECT_INSIDE;
    	float a = r.GetDirection().dot(r.GetDirection());
    	float b = 2 * r.GetDirection().dot(o);
    	//Find discriminant
    	float disc = b * b - 4 * a * c;
    	// if discriminant is negative there are no real roots, so return 
    	// false as ray misses sphere
    	if (disc < 0)
    		return INTERSECT_NONE;
    	// compute q as described above
    	float disc_sqrt = sqrtf(disc);
    	float q = (b < 0) ? (-b - disc_sqrt) / 2.0f : (-b + disc_sqrt) / 2.0f;
    	// compute t0 and t1
    	float t0 = q / a;
    	float t1 = c / q;
    	// make sure t0 is smaller than t1
    	if (t0 > t1)
    	// if t1 is less than zero, the object is in the ray's negative direction
    	// and consequently the ray misses the sphere
    	if (t1 < 0)
    		return INTERSECT_NONE;
    	// if t0 is less than zero, the intersection point is at t1
    	if (t0 < 0)
    		fDistance = t1;
    	// else the intersection point is at t0
    		fDistance = t0;

    it comes out correctly: HfTyOif.png


    and for the life of me a can't figure out why the optimized variants seem to degrade so badly...

    Anybody got any hints/advice as to whats going wrong here? is this just a side-effect from approximating it geometrically?

  14. Use the preprocessor to generate the variants at compile time, keeping things together while removing unneeded runtime overhead (When you generate the shaders depends on your pipeline, but in effect you can generate every variant before distribution, and index them via a hash or fingerprint).

    //my code with shadows...
    //my code without shadows

    you can read up more on the PP here.

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!