Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 07 Mar 2009
Offline Last Active Yesterday, 01:02 AM

#5308677 Working code doesnt work in opencl

Posted by on 30 August 2016 - 09:39 AM

intel's opencl sdk integrates nicely into visual studio (and I think in eclipse also), and allows you to debug opencl programs nearly as nicely as native c code.

#5308676 d3d11 frustum culling on gpu

Posted by on 30 August 2016 - 09:34 AM




I think you may be missing the point of frustum culling...

what point?


usually to save tons of CPU work, to maintain streaming and manage memory. Frustum culling on CPU is quite efficient and you might not have any benefits of doing it on GPU unless you have some very specific case.

#5300769 How Do You Handle Gamma Correction?

Posted by on 14 July 2016 - 12:57 PM

before doing calculations with them in your shaders, you convert them to linear space (by raising it to the power of 2)

you need to raise it by the inverse, 1/2.2

#5299777 Is it possible “Update” texture in one pass?

Posted by on 08 July 2016 - 08:33 AM

as long as the compute shader runs per pixel (and doesn't sample neighbouring pixel),  Sergio's suggest should work well.

#5299735 Techniques used for precomputing lightmaps

Posted by on 08 July 2016 - 12:22 AM

i think photon mapping is the simplest and most feature rich way to go. you can gather indirect illumination as well as direct illumination like projected colored light without adding complexity, as every light source can be handled in individual passes.
gathering can also easily add features like caustics, ambient occlusion without addinh complexity.

the real challenge in my experience is uv generation. in a first step you might want to skip it and use some existing tool.

check out the paper of square enix on fast global illumination baking and from last of us.

#5297685 Storing Signed Distance Fields

Posted by on 23 June 2016 - 06:18 AM

it depends heavily on the data, the use case and goal.



- can be real distance fields, which tend to act like a compression if your data is regular e.g. Valve stores 4096 textures in 64 distance fields:


- can be spare data, in which case an kdtree (sometimes with "Bricks" as leaves) is the weapon of choice,e.g.: http://s08.idav.ucdavis.edu/olick-current-and-next-generation-parallelism-in-games.pdf

- can be a voxel field of functions that approximate non-orthogonal surfaces rather than increasing detail with by resolution, e.g.: https://mediatech.aalto.fi/~samuli/publications/laine2010i3d_paper.pdf



use case:

-medical rendering, it's usually done by marching a semi-transparent volume. I've seen people using LZSS or RLE compression per slice, which is on-the-fly decompressed to tiny caches. e.g. http://raycast.org/powerup/publications/FITpaper2007.pdf

-satelite data: it's usually a heightmap, although rendered as voxel, it's stored as 2d images. sometimes this is done as multi-layer image, which is efficient for very spares volumes (look up "depth peeling", this shows the basic concept of it)

-game data, (in the simplest case: minecraft), this is often just a huge grid, containing sub grids/chunks, which are zipped on disc. the amount is very low.

-uniform points? aka unlimited detail technology?




-are you running out of disc space? video memory? 

-are you trying to save bandwidth for rendering? are you really bandwidth bound? or fetch bound by TMU?

-are you trying to voxelize in real time? or streaming static data? transcoding involved?

-interactive visualization of scientific data (1-10fps)? pre-visualization of cinematic rendering (<1fps)? cad editor (>10fps)? game(60fps)?


in general, 900^3 doesn't sound like that much, I ran 4096^3 with realtime voxelization: http://twitpic.com/3rm2sa on CPU,  Jon Olick ran 16384^3 (if I recall correctly) in https://www.youtube.com/watch?v=VpEpAFGplnI on an GTX200 or something, and medical data of about 1k^3 ran on some Pentium4. If you get the rendering right, 900^3 should be a piece of cake on modern GPUs. If you just want to accelerate rendering in a cheap way, rather use leap-stepping: http://citeseerx.ist.psu.edu/viewdoc/download?doi=

if you URGENTLY want to compress data. implement some of the GPU based compression. E.g. don't store R8G8B8A8, but instead BC1 blocks.

#5297055 Finding the "Fun Factor" in a tycoon game

Posted by on 17 June 2016 - 07:16 PM

i can imagine the fun part would be to not only pick based on prices, but also by size, noise, brand, compatibility...

and to add a planning component, you could let the player fit the parts together into a case, which either the player chooses, based on target buyers or some client decides (e.g. wallmart).

the fitting doesn't need to be super accurate simulated. it can be like "tetrising" random shaped blocks into a box-case.

#5293755 does video memory cache effect efficiency

Posted by on 27 May 2016 - 03:33 AM

to be more clear, GPU-caches are more often streaming-buffer than actually a cache. That's because classical 3D work is very predictive. On CPU side, you cannot predict the access pattern, you'd need to execute all previous instructions to determine what access a particular piece of code will do. On GPU on the other side, once the command buffer is flushed to the GPU, all drawcalls of the frame are 100% specified. you know exactly what vertex #16 of drawcall #1337 gonna be.

Hence, in a lot of cases, the GPU just needs to start to read from vmem ahead of the usage, it could be processing vertex #0, but already loading vertex #100 into the streaming buffer (aka cache).

Having random order of vertices might not be noticeable, if there is enough work to do on other units, as the GPU (unlike CPU) should usually not stall on memory look-ups. But if there is not much to do per-vertex, the memory fetching will just not keep up with the processing, as accessing random places in memory is way more work and way more wasteful than accessing data in a linear way.


(in modern rendering, where the flow is more cpu-alike, this changes, of course.)

#5291340 Planet rendering: From space to ground

Posted by on 12 May 2016 - 04:06 PM

1. we just had a nice topic about it http://www.gamedev.net/topic/677700-planet-rendering-spherical-level-of-detail-in-less-than-100-lines-of-c/

2. you rather update the VBO with what is visible. You'd not fit all LODs in one go into vmem.

3. I've told one solution in the topic from 1. the thread started implemented it, I think the source is public, check it out, it's easy.

4. :) http://www.gamedev.net/topic/677700-planet-rendering-spherical-level-of-detail-in-less-than-100-lines-of-c/

#5289770 Pixel Shader Uniform Cannot be Found

Posted by on 02 May 2016 - 12:57 PM

is it used in a way that influences the output? otherwise it might still be optimized out.

#5287998 Depth Problem In Release Mode

Posted by on 21 April 2016 - 10:40 AM

1. add "while(true){Sleep(0);}" at the beginning of "main"

2. start the application externally

3. attach the debugger

4. break at the Sleep

5. set the next line as the point to continue the execution (right click context menu)

6. debug what's different :)

7. fix it

8. transfer out of gratefulness all your moneyz to my account (that's optional, but 7. might break if you don't.)

#5286654 Occlusion culling w/o a Zbuffer

Posted by on 13 April 2016 - 07:55 AM

zbuffer ist actually less accurate (if lower resolution, which is the common case) and has more corner cases (e.g. you need water tight rendering, deal with z-fighting issues etc.)

Hadn't thought about z-fighting, and I was thinking full size gpu occlusion so I hadn't thought of that.  What do you by "mean water tight rendering", I've heard the term before but don't remember what it means?

z-fighting happens if you use e.g. lower LODs for occlusion culling (which makes sense if you think about it, as you just test, but you don't care about the visuals). The problem pops up if you e.g. have a picture on the wall (flat poly) and some LOD of the wall/room is used which is not 100% fitting the higher poly that is rendering close up.

water tight rendering is, when you don't have pixel-gaps between triangles of a mesh. Equally important is to not "over-draw" edges twice. you'd notice this on transparent triangles where edge-pixel blend twice.
this is especialyl important on occlusion culling, as one pixel-gap will "un-hide" everything behind and all occlusion culling was a waste of time. similar drawing too much will lead to occlusion in places you don't want to.

#5286507 DirectX 11 Volume Rendering Advice Needed

Posted by on 12 April 2016 - 12:32 PM

more efficient and simpler would be to do all in one pass.
render just the back faces, calculate with a simple ray intersection the near-distance of the cube (if t<0.0, then t=0.0); and trace through the volume.

#5286265 Occlusion culling w/o a Zbuffer

Posted by on 11 April 2016 - 04:31 AM

I know a zbuffer is easier more accurate and there are no corner cases with it but I am curious about this.

zbuffer ist actually less accurate (if lower resolution, which is the common case) and has more corner cases (e.g. you need water tight rendering, deal with z-fighting issues etc.)

the bad side of occlusion volumes is that these are disconnected, hence just fully occluded object will be removed, objects crossing several occlusion volumes will stay, although in reality invisible.
one solution to this is "occluder fusion", but that's where math become dirty. (actually, not the math, but dealing with all the float precision issues.)

#5285612 Planet Rendering: Spherical Level-of-Detail in less than 100 lines of C++

Posted by on 07 April 2016 - 12:14 PM

Your bezier and subdivision samples look nice. I wonder if its easy to make it to a recursive approach (looks like uniform subdivision in the screenshots)

the subdivision (catmull clark) is recursive. it's used in the Reyes rendering.
it's a really elegant algorithm.

1. Yes, due to normalizing it does not work immediately for the current code, but you could use  a displacement map and combine rendering with a tessellation shader.

there goes all the challenge ;)
there was a nice paper about subdivision: