Jump to content

  • Log In with Google      Sign In   
  • Create Account


Krypt0n

Member Since 07 Mar 2009
Online Last Active Today, 06:22 AM

#5166164 optymizations by self modifying code

Posted by Krypt0n on 11 July 2014 - 03:55 AM

when you write assembly, sooner or later ever programmer gets the idea to modify code on runtime. you can start quite simple by changing some constants, e.g. instead of

mov bx,Variable
.
.
.
add ax,bx
you rather write

lea di, m1 ;load the address of the opcode at marker "m1"
add di, 3 ;this is an offset to the actual constant in the add instruction, skipping the opcode
mov bx, Variable
mov [di], bx
.
.
.
m1:
add ax, 0 ;0 will be replaced by above code
I've used this in dos times for rasterization quite a lot, e.g. if u calculate the borders of a triangle with y=m*x+b, m and b are constants, yet they use otherwise precious register space (and you had just ax,bx,cd,dx,di,si beside stack etc.). and as those don't change, you can rather replace the values in the binary with those kind of constants.

next step that comes into your mind is, if you have some inner loop and you'd need to rewrite it 100 times for various different cases (and some guys do that e.g. http://www.amazon.com/Tricks-Programming-Gurus-Advanced-Graphics-Rasterization/dp/0672318350/ )you could just add some jumps and you modify the destination offset. static jumps are executed in a different part of the cpu than all the math etc. and are essentially free as there is no false prediction. that way you can switch on and off textures, blending etc. of the rasterize with just a few lines of code.
like said above, there are a few guys who write a runtime compiler for that, but that's the crazy banana version if you really really want to get the best performance, but that's rather for complex shader cases where you would otherwise end up with crazy amount of jumps. for simple cases (<=Direct3D 6) modifying some constants was good enough to get the needed performance. it made also no sense to copy around code chunks, as that copy would cost you time and would barely have a different runtime speed than a modified jump (aka jump table) to that code area.

today it's a bit dangerous, caches and pipelines assume that the code is static. even with just data you can run into hazards in multithreaded applications, that's even more dangerous for code segments. tho, it's not impossible, I think pretty much every OS allows you to unlock segments for writing/modifying and if you know the cpu architecture, you can enforce the syncs that are needed.

the craziest think I've done with SMC was for my raytracer, where I've basically 'dumped' the BSP tree as assembly code. Instead of a tiny loop that progresses randomly on either side of the BSP tree, the 'unrolled' code was processed mostly in a very similar way (every ray starts at the same place and most will be split by the same node as the previous ray and most will process the branch of the leaf as the previous node).
sadly it just worked out for a small BSP, before I've even ran out of L1 instruction cache, I've somehow run out of the space that the jump prediction can cover and then the performance dropped dramatically, below the version with the tiny loop. The next more 'crazy' step would be to evaluate every frame the most likely walking path of the BSP and dump a code tree that aligns with what the static code prediction would guess.. but I didn't do that as my way of SMC was to dump a c++ file and invoke the cl.exe of visual studio, which is ok on load time, but not if you have 16ms, to generate a binary-lib that I've parsed and copied into my binary.


#5151784 400% Raytracing Speed-Up by Re-Projection (Image Warping)

Posted by Krypt0n on 06 May 2014 - 04:23 AM

Intro I have been working a while on this technology and since real-time raytracing is getting faster like with the Brigade Raytracer e.g., I believe this can be an important contribution to this area, as it might bring raytracing one step closer to being usable for video games.

I had a feeling, by watching their demos that they're doing this already. (but the video quality is very bad, it's hard to see, but the silhouette ghosting artifacts made me think that)
 

Algorithm: The technology exploits temporal coherence between two consecutive rendered images to speed up ray-casting. The idea is to store the x- y- and z-coordinate for each pixel in the scene in a coordinate-buffer and re-project it into the following screen using the differential view matrix. The resulting image will look as below.

you don't need to store x,y,z, it's enough to have the depth, from there and the screen pixel coordinates, you can re-project it. that's done e.g. in Crysis 2 (called temporal AA) and in killzone 4 it got recently famous ( http://www.killzone.com/en_GB/blog/news/2014-03-06_regarding-killzone-shadow-fall-and-1080p.html ) and old school cpu-tracer demos have done that as well. This becomes easily a memory bound thingy, that's why reducing the fragment size to a minimum is key. Practically it should be nearly not noticeable compared to the tracing time.

 

The method then gathers empty 2x2 pixel blocks on the screen and stores them into an indexbuffer for raycasting the holes. Raycasting single pixels too inefficient. Small holes remaining after the hole-filling pass are closed by a simple image filter.

in path tracing, the trick is to not only re-project the final pixel (those are anti-aliased etc. and give you anyway wrong result), you have to save the originally traced image (with 10spp it's 10x the size!) and re-project those samples, then you'll get a pretty perfect coverage.
updates are now done on an interleaved pattern, e.g. replacing 1 out of 10 samples of the re-projection source buffer per frame.
 

Results: Most of the pixels can be re-used this way as only a fraction of the original needs to be raycated, The speed up is significant and up to 5x the original speed, depending on the scene. The implementation is applied to voxel octree raycasting using open cl, but it can eventhough be used for conventional triangle based raycasting.

5x is also what I've seen for primary rays, if you have enough depth during path tracing, it can get closer to linear speedup (depending on the spp and the update rate).
 

Limitations: The method also comes with limitations of course. So the speed up depends on the motion in the scene obviously, and the method is only suitable for primary rays and pixel properties that remain constant over multiple frames, such as static ambient lighting. Further, during fast motions, the silhouettes of geometry close to the camera tends to loose precision and geometry in the background will not move as smooth as if the scene is fully raytraced each time. There, future work might include creating suitable image filters to avoid these effects.

this also works for secondary rays, but gets a bit more complex, you have to save not only the position, but also the 2nd bounce and recalculate the shading using the brdf. I think vray is doing that, calling it 'light cache'.

the trick with motion is that you won't notice artifacts that much, so you can keep the update-rate stable, areas of fast motions will look obviously worse, but you won't notice it that much. only real problem you can get is similar to stochastic rendering, where a lot of samples fall into the same pixel and you have to use some smart reconstruction filter to figure out which samples really contribute the that pixel, and which ones are just 'shining' through and should be rejected. that might not be noticeable that much in your video, but if you trace e.g. trees with detailed leaves, you'll have to re-project sky samples and close leave samples and there is not really an easy way to decide what sky pixel to reject and which ones to to keep. best I've seen so far was a guy who used primitive-ids and has done some minimal triangulation of samples with the same id, but the result were far from good.
in Crysis 2, during motion, you'll notice that tree-tops look like they get thicker, I guess they use simply a nearest filter to combine sub-samples.


the silhouette issue arises when you work on the final buffer, that's what you can see in the rasterizer versions like in Crysis 2, Killzone 4. re-projecting the spp-buffer (using the propper subpixel position) will end up with no ghosting on sillouettes (beside the previously mentioned reconstruction issue).


#5150285 what's the current best compression (library) for Images?

Posted by Krypt0n on 29 April 2014 - 03:53 AM

jpg should be a good start, there are ways to improve it like stripping the header if you have textures of the same type etc. you could start with http://en.wikipedia.org/wiki/Libjpeg

it supports arithmetic encoding (which is a rare case due to patents), but that should get your texture fairly small.

 

you can use google's https://developers.google.com/speed/webp, it's like an improved jpg format (I think Rage uses also some improved jpg version for their megatextures).

 

if you hand out more details, it's quite possible to come up with even better solutions, e.g.

-does the price of the lib matter? (there are some paid solutions which promise marvelous results)

-are you using photorealistic textures or painted or maybe colorful manga style texture?

-what's the final format you'll use it (that implies loss as well and might be of advantage to cope that with the file format instead of having two quantizations)

-quality doesn't matter?




#5148961 Forward rendering light management

Posted by Krypt0n on 23 April 2014 - 08:15 AM

the way we solve it nowadays is called forward+ shading, or tiled lighting, or...

you can find a great collection of the main papers at http://aras-p.info/blog/2012/03/27/tiled-forward-shading-links/




#5145211 Preparing for Mantle

Posted by Krypt0n on 07 April 2014 - 07:02 PM

Create a command list, create some resources, execute a command list with a set of resources as inputs, done.

so, how do you keep a game (engine) flexible, yet knowing _all_ the states etc. that you don't want to create on runtime? (assume pipeline creation can take as much time as shader linking in opengl which is 1s in bad cases).
there is no driver anymore that does that in a background thread for you, in an as fast as possible way.
assume you have about 1024 shader combination, add stencil, rasterizer, blend, rendertarget permutations that might be part of the gpu-setup and therefor included in one static state you have to create.
assume, it's not a state creation that is cross platform, but per driver+gpu, you cannot really do it offline before you ship the game.
 

The rest of the changes are conceptual changes to simplify the resources model (no more different kinds of buffers, simpler texture semantics, etc.).

there still are. check out the links in the 2nd post. it's split in 2 stages
1. you allocate a bunch of memory
2. you prepare it for a specific usage case e.g. as render target or vertexbuffer.

now assume you want to use a texture as render target and use it as source in the 2nd drawcall (e.g. some temporal texture you use in post processing). you need to state that to the API.
assume further you take advantage of the new multithreaded command generation, so you can't keep track of the state of an object inside the object, you rather need to track states per commandbuffer/thread.
assume further, you don't want to do redundant state conversions, as those might be quite expensive (changing layouts of buffers to make them best suited for texture sampling, for rendering, for vertex access), so you'd need to actually somehow merge the states of resources you use in consecutive command buffers.


 

the more explicit threading model (only particularly relevant if you want/need render threading), and the more explicit device model (pick which GPU you use for what on multi-GPU systems).

you know you have to test and balance all that? cross fire works across different GPUs. you can have an APU gpu + some mid range Radeon HD 7700 + a top notch R9 290x.
and with D3D, there is a generic driver that might execute asymmetrically on those GPUs. it's something you'd need to handle.
I don't say that's impossible, but for the majority of devs, it can end up in either a lot of work (testing all kind of configuration in various parts of your game) or you can disappoint some high end users that their expensive 4x crossfire is no faster thatn 3x crossfire or even buggy.


a lot of work that drivers did before, will end up in the hands of devs and it's not optional, it's what you'll have to do. you might ship a perfectly fine running game and some new GPU might take advantage of something that hasn't been used before and it might uncover a bug in your 1year old game that ppl still play. and AMD/NV won't release a driver fix, you need to release a patch.

I see benefits you've mentioned, but I also see all the drawbacks.
I like low level programming on consoles, below what mantle/D3D12 offers, but I'm not sure about the PC side. when there was Glide/S3 Metal/RRedline/... and even GL was working different (MiniGL/PowerSGL/..) every developer felt relieved it ended with D3D. and the RefRas was actually pushed by game devs, to be able to validate something is a driver bug. now it all seem forgotten and like a step back.

the Cass Everitt talk really seems like the best balance of both worlds to me (if it would be extended a little bit).


#5145203 Preparing for Mantle

Posted by Krypt0n on 07 April 2014 - 06:32 PM

Regarding being CPU bound - this depends on whether you're making a graphical tech demo, or a game.
For the former, you might have 16ms of GPU time and 16ms of CPU time per frame dedicated to graphics.
For the latter, you've still got 16ms of GPU time (for now, until everyone else realizes you've ended up as the gatekeeper of their SPU-type jobs!), but maybe only a budget of 2ms of CPU time because all the other departments on the team need CPU time as well! In that situation, it's much easier to overrun your CPU budget...

yet there are very few games that saturate more than 2 cores. most have a render thread and that one is running independent of the other parts, that implies, from the architecture point of view, rendering in games nowadays runs no different than in tech demos unless your job system really fills up all cores and could benefit from freeing up the rendering thread/core.
if you don't occupy all cores and you don't run a render thread, there is no reason to complain about API limitations.

P.S. I'm about to sign an NDA with AMD to get access to Mantle, so it's obviously being released wider than just DICE now biggrin.png

part of the NDA is to not talk about the NDA ;)


#5145199 Preparing for Mantle

Posted by Krypt0n on 07 April 2014 - 06:24 PM

yet it makes me wonder, are we really that much cpu bound? from my perspective, it needs a really slow cpu to saturate on the API side. usually, with instancing etc. any modern i3,i5,i7 is fast enough in a single thread to saturate on the GPU side.

In my experience it's very easy to be CPU-bound in D3D11 with real-world rendering scenarios. Lots of draw calls, and lots of resource bindings. This is true for us even on beefy Intel CPU's. We've had to invest considerable amounts of engineering effort into changing our asset pipeline and engine runtime in ways that reduced CPU usage.

I'm implying that you'll end up doing the same for D3D12/Mantle, just not because of the CPU, but because the GPU will have idle-bubbles in the pipeline if you start switching states. (if you profile on consoles, with low CPU overhead, that's what you'll see) It's still work that has to be done and an 1GHz sequential processor won't do any magic. (not talking bout shaders, but bout the command processor part!)
We have low level access to HW for consoles and while you might think we could now end up being wasteful with drawcalls etc. we actually waste a lot of SPU cycles to batch meshes and remove redundant states and even shader preperation that the GPU could handle, to avoid it on the GPU.
it's just moving the bottleneck to another place, but it's not removing it and at some point you'll hit it again and end up with the same old thinking: the fastest optimization is to not do wasteful work, no matter how fast you'd do it otherwise.

 
 

The opengl extension from NVidia's talk are somehow way more what I'd hope for the direction of 'next gen apis'. it's as easy to use as opengl always was, just extending the critical parts to perform better. (I'm talking bout http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead ). it's actually making things nicer with persistent mapped buffers (you don't need to guess and hope how every driver will 'optimize' your calls and you have all the responsibility and possibilities that comes with  using persistent buffers). and if multidrawindirect would be extended a bit more to support an array of indexed shader objects, you could render the whole solid pass with one drawcall. shadowmaps would possibly end up being one drawcall each and preparing those batched drawcalls could be done in multithreaded way if you want.

Really? The future you want is more instancing, wrapped up in a typical OpenGL layer of "you have to do it in this super-special way in order to hit the fast path"??? To me it's completely at ends with what actual software developers want.

I have a feeling you haven't looked into Cass Everitt's talk.
it's not about classical instancing.
it's about the creation of a list of drawcalls, with various resources (vertexbuffers, indexbuffers, textures...) and just submitting all of it in one drawcall, so instead of




for_all_my_drawcall
  gl set states
  gl draw mesh
you write




for_all_my_drawcall
  store_states into array
  store_mesh offsets/count etc. into array

gl draw_everything of array 
so, there is no "you have to do it in this super-special way in order to hit the fast path", it's quite the opposite, a very generic way. you don't have to touch the shader or something to account for some special instancing or something. you don't have to worry about resource limits and binding. all you do is creating a vector of all drawcalls, just like you'd 'record' it with mantle/D3D12.

yes, it's more limited right now, but that's why I've said, I'd rather see this extended.

 

Everybody who works on consoles knows how low-overhead it *should* be to generate command buffers, and so they constantly beg for lower-overhead draw calls, better multithreading, and more access to GPU memory. Instead we get that "zero driver overhead" presentation that's like "lol too bad we're never going to change anything, here's some new extensions that only work on Nvidia and may require to to completely rewrite your rendering pipeline to use effectively." Great :-/

I really disagree on that one.
it offers you persistant memory, where you can write multithreaded and manage it yourself, just like we do on consoles. it offers you to create command lists (rather vectors) in a multithreaded way, as you can do on consoles. and it's not about "we won't change a thing", it's rather "we've already given you a 90% solution that you can get hands on right now and the changes required are minimal compared to the rewrite for D3D12/Mantle for 10% more".

no offense intended, but have you really looked into it? I can't think of why it would be a pipeline rewrite for you, it's just a little change in buffer management (which aligns well with what you do if you follow best practice guides like https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/gdc12/Efficient_Buffer_Management_McDonald.pdf ) and the 2nd part is 'recording' of drawcalls, less complex than with D3D12/Mantle (because you don't have to pre-create all states and manage them), which isn't that different to what you do if you try to sort your drawcalls to minimize state switching (which everyone does even on consoles, where drawcalls should be cheap, yet those hit you hard on GPU).

 

and in case we don't want to optimize for the current saturation, but rather increase drawcall count etc. I really wonder when it starts to be rather suboptimal on the GPU side. if we'd be able to push 10M draw calls/s, that's like 100cycle/DC on a modern gpu and those have really deep pipelines, sometimes needing flushes for caches, every DC needs some context for the gpu setup that needs to be fetched. We'll end up with "yes, this could be done, but would be suboptimal, lets go back to instancing etc. again".
that's no different than what we do now. few 'big' setups per frame and pushing as many drawcalls with as few state/resource changes as possible to saturate rather on shader/rasterization/fillrate side.

Of course you can end up getting GPU limited, but the problem is that right now we can't even get to that point because there's too much CPU overhead. Software should be able to hit that breaking point of where batching is needed for GPU performance, and then developers can decide case-by-case on how much it makes sense for them to persue instancing and things like that. It shouldn't be that you're forced into 100% instancing from the start otherwise you're dead in the water on PC, at least in my humble opinion.

well, maybe I'm just too used to prepare everything in best way for GPU's, we barely ran into cpu limitations due to rendering. most of the time the GPU tries to run our games. it was at first as if consoles have benefits due to low overhead, but then you take some captures and realize you pay for cache and pipeline flushes and the solution is just the plain old way you'd always optimize for <D3D12 .
I just expect the same for D3D12/Mantle.


#5145044 Jittering near a landscape.

Posted by Krypt0n on 07 April 2014 - 08:51 AM

I think the problem are not the values for rendering, because it's stable when he just moves around, it rather seems like the it only jitters when physics is involved, which might be due to very fine time steps and some squaring etc. you might do because of e=m*c^2 :)

 

switching to doubles is usually not a great solution, adjusting ranges is better, but in case of a plane simulation, where you simulate just a few units and it's a really big space with maybe detailed movement (e.g. rolling slowly on the ground), double might be ok'ish.

 

and I'm not 100% sure it's really the problem either, so I suggested to try it out, sorry if there was no detailed explanation 'why'. I was rather really trying to remote debug it. maybe it won't help.




#5145010 Jittering near a landscape.

Posted by Krypt0n on 07 April 2014 - 07:04 AM

looks like "float" is not enough for your plane physics. try if "double" solves the problem.




#5144948 Preparing for Mantle

Posted by Krypt0n on 07 April 2014 - 02:01 AM

lets not get of topic with rants about AMD. The topic is quite interesting and if not for Mantle, the same points can be made for D3D12 and we can be quite sure MS will release it to the public at some point.

 

I think there are two main components that make the new APIs different from the previous ones.

1.  a lot of caching/pre-creation of states. this can make your life quite difficult if you haven't designed for it. D3D11 has already states, but it's more of a bundling of settings to have less api calls, but with the new APIs, it seems like they optimize a lot of the whole GPU setup (kinda of similar to shader linking in opengl). Previously you could have artist controlled states or even dynamically created states by the game, but now, you don't really want to create those on runtime.

The issues we had in the past console generation with shader permutations, where you had tons of 'cached' version, depending on flags that each adds 2x the shader amount, now it will be the same for the whole rendering setup.

you can probably set any vertexshader, any pixelshader and then disable color writes, knowing the whole scope, the Mantle/D3D12 driver should be able to back track the GPU setup to your vertexshader, knowing just positions are needed and strip out every other redundant bit (which previously those 'mysterious' driver threads might or might not have done).

But this might be quite vendor specific (some might end up with the same gpu setup for two different states, e.g. in one you disable color writes in the other you set blend to be add(Zero,Zero) and other driver might not detect this), not sure how this would reflect the the API. whether you'd know pipelines are the same and you could adjust your sorting to account for this.

Everything on runtime needs to select from the pre-created permutation set to have a stable framerate. I wonder if there will be any guarantees on how long a pipeline creation might take (in opengl (es), sometimes shader linking takes several seconds). That's not only and issue of renderer architecture, but also initialization time. you don't want to spent minutes to cache thousands of states.

2. multithreading: previously I think there was no game that has used multithreading to speed up the API part. mainly because it was either not supported to use multiple interfaces or when there was a way (e.g. d3d11), it was actually slower.

yet it makes me wonder, are we really that much cpu bound? from my perspective, it needs a really slow cpu to saturate on the API side. usually, with instancing etc. any modern i3,i5,i7 is fast enough in a single thread to saturate on the GPU side.

and in case we don't want to optimize for the current saturation, but rather increase drawcall count etc. I really wonder when it starts to be rather suboptimal on the GPU side. if we'd be able to push 10M draw calls/s, that's like 100cycle/DC on a modern gpu and those have really deep pipelines, sometimes needing flushes for caches, every DC needs some context for the gpu setup that needs to be fetched. We'll end up with "yes, this could be done, but would be suboptimal, lets go back to instancing etc. again".

that's no different than what we do now. few 'big' setups per frame and pushing as many drawcalls with as few state/resource changes as possible to saturate rather on shader/rasterization/fillrate side.

 

The opengl extension from NVidia's talk are somehow way more what I'd hope for the direction of 'next gen apis'. it's as easy to use as opengl always was, just extending the critical parts to perform better. (I'm talking bout http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead ). it's actually making things nicer with persistent mapped buffers (you don't need to guess and hope how every driver will 'optimize' your calls and you have all the responsibility and possibilities that comes with  using persistent buffers). and if multidrawindirect would be extended a bit more to support an array of indexed shader objects, you could render the whole solid pass with one drawcall. shadowmaps would possibly end up being one drawcall each and preparing those batched drawcalls could be done in multithreaded way if you want.

 

feels like GL has not the great marketing campain, but designing the next renderer would mean for me to rather go the NV/GL way and map it to D3D12/Mantle under the hood.




#5144616 Relative or absolute coordinate system

Posted by Krypt0n on 05 April 2014 - 02:32 PM

it depends whether you will change the gameplay if you let players see a bigger region when they play on a bigger screen.

if there is no drawback and gameplay stays the same, keep the proper pixel ratio, but if there are gameplay changes, then rather scale or make a letter box.

 

some games have various side menus to crop the playregion to always the same size.

 

another, yet a bit overpowered solution, is to have all art in way higher resolution (e.g. 4x4 higher) and render it with anisotropic filtering on the gpu to any resolution, it should end up quite detailed with not really noticeable scaling artifacts.




#5142813 smoth fog of war

Posted by Krypt0n on 28 March 2014 - 05:12 AM

an alternative is to render those black tiles into a white surface and put a gaussian blur on it, then multiply the full rendered tiled screen with the blurred surface. as a gimmick, instead of having just fully fogged or non-fogged tiles, you could actually use values inbetween, either by distance to that tile or time based once a tile is uncovered.

 

if you draw in software, that fog can be calculated in a one channel texture and gaussian blur is separable, with two passes it's fast enough for most cpus.




#5142208 Ray Bounding box intersection

Posted by Krypt0n on 25 March 2014 - 11:15 PM

the algorithm is explained in detail, starting from the basic concept, ending at the optimized version you show, by their inventors in: http://people.csail.mit.edu/amy/papers/box-jgt.pdf




#5141702 Material Layering question

Posted by Krypt0n on 24 March 2014 - 08:11 AM

Question 1: The material layering in the shader works by linearly interpolating the material attributes & textures, correct ? So you'd just do something like: 





float3 DiffMap_0 = DiffuseMap_Layer0.Sample(DiffuseMapSampler, input.TexCoord).rgb;
float3 DiffMap_1 = DiffuseMap_Layer1.Sample(DiffuseMapSampler, input.TexCoord).rgb;
float3 DiffuseAlbedo = lerp(DiffMap_0, DiffMap_1, material_DiffBlendFactor.x);

that's just one way to do it, linear is usually the least natural looking way of blending layers. but you can just as good go for a selection way of blending

float4 DiffMap_0 = DiffuseMap_Layer0.Sample(DiffuseMapSampler, input.TexCoord); 
float4 DiffMap_1 = DiffuseMap_Layer1.Sample(DiffuseMapSampler, input.TexCoord); 
DiffMap_0.a*=(1.f-material_DiffBlendFactor.x);  //edited, here just alpha should be multiplied, while aplha is e.g. a heightmap
DiffMap_1.a*=material_DiffBlendFactor.x;
float3 DiffuseAlbedo = DiffMap_0.a>DiffMap_1.a?DiffMap_0:DiffMap_1;

this will lead to a more natural looking mix (e.g. if you blend a sand and stone layer), yet it will have hard borders. you can go for more advanced ways of blending, some lerps inbetween selections etc.

 

2. the simple solution is to use a texture array and pass to the shader how many iteration it need to go over the array to accumulate/blend/select all layers you supply.

 

3. that's purely depends on your demands to visual quality. the proper way would probably be to make shading etc. on the individual layer and blend just the final results. blending source data will never be correct. it boils down to what will make your eye happy.




#5138958 What to do with a voxel ray-caster

Posted by Krypt0n on 14 March 2014 - 08:10 AM

I think your limitations already guide to the next steps






PARTNERS