Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 06 Aug 2009
Offline Last Active Apr 13 2016 08:44 AM

#5286666 I'm building a photonmapping lightmapper

Posted by on 13 April 2016 - 08:45 AM

Ok little report from the front.

I figured that one big problem I have is a problem of precision during reconstruction of the 3d position from the lumel position.

I have a bad spatial quantization, using some bias helped remove some artefacts, but the biggest bugs arn't gone.


anyway, some results applied to real world maps show that indirect lighting shows and makes some amount of difference:


(imgur album)










#5284254 I'm building a photonmapping lightmapper

Posted by on 30 March 2016 - 07:36 AM

Awight, there we go, the drawing:



Ok so with this, it's much clearer I hope, the problem is by what figure should I be diving the energy gathered at one point ?

The algo works this way:

first pass is direct lighting and sky ambient irradiance.

second pass creates photons out of the lumels from the first pass.

the third pass scans the lightmap lumels and do the final gather.

The gather works by sending 12 primary rays, finding k-neighbors from the hit position, tracing back to origin and summing Lambertians.

Normally one would think we have to divide by number of samples, but the samples can vary according to photon density, and the density is not significant because I store a color in them. (i.e. their distribution is not the mean of storing the flux like in some implementations)

Also, the radius depends on the primary ray length, which means more or less photons will be intercepted in the neighborhood depending if it hits close or far. And finally the secondary rays can encounter occluders, so it's not like it will gather N and we can divide by N. If we divide by the number of rays that arrive at the origin, we are going to have an unfair energy boost.

I tend to think I should divide by the number of intercepted photons in the radius ?


Edit: that's the photon map visualizer. I made so that colors = clusters.


#5284068 I'm building a photonmapping lightmapper

Posted by on 29 March 2016 - 09:25 AM

Hey guys, I wanted to do some reporting on a tech I'm trying to achieve here.


I took my old (2002) engine out of its dusty shelves and tried to incorporate raytracing to render light on maps,

I wrote a little article about it here:



First I built an adhoc parameterizer, a packer (with triangle pairing and re-orientation using hull detection), a conservative rasterizer, a 2nd set uv generator;

And finally, it uses embree to cast rays as fast as possible.

I have a nice ambient evaluation by now, like this:



but the second step is to get final gathering working. (as in Henrik Wan Jensen style)

I am writing a report about the temporary results here:



As you can see it's not in working order yet.

I'd like to ask, if someone already implemented this kind of stuff here, did you use a kd-tree to store the photons ?

I'm using space hashing for the moment.

Also, I have a bunch of issues I'd like to discuss:

One is about light attenuation with distance. In the classic 1/r*r formula, r depends on the units (world units) that you chose. I find that very strange.

Second, is about the factor by which you divide to normalize your energy knowing some amount of gather rays.


My tech fixes the number of gather rays by command line, somehow (indirectly), but each gather ray is in fact only a starting point to create a whole stream of actual rays that will be spawned from the photons found in the vicinity of the primary hit.

The result is that i get "cloud * sample" number of rays, but cloud-arity is very difficult to predict because it depends on the radius of the primary ray.

I should draw a picture for this but I must sleep now, I'll do it tomorrow for sure. But for now, the question is that it is kind of fuzzy how many rays are going to get gathered, so I can't clearly divide by sampleCount, nor can I divide by "cloud-arity * sampleCount" because the arity depends on occlusion trimming. (always dividing exactly by number of rays would make for a stupid evaluator that just says everything is grey)


I promise, a drawing, lol ;) In the meantime any thought is welcomed

#5244656 Which version should I choice to build my ocean?

Posted by on 05 August 2015 - 08:43 AM

You don't want to write half a page because .... you are lazy ?

It will be tough implementing a nice ocean system if you are that lazy.

Choose, either you don't care about sharing what you do, or you care and you write your half page. Half a page is nothing, read at least the papers you linked, how many pages do they have ? how many months you think it took those researchers to do their work AND cherry on top, write about it and share it publicly ?

Meanwhile, what are you doing ?

#5239838 54x50 tiles map searching for neighbours - takes extremely long

Posted by on 11 July 2015 - 07:39 PM

You're all wrong, you should use morton code (z order) for better cache coherency on adjacent cells :P

#5238364 Job Manager Lock-Free

Posted by on 04 July 2015 - 11:22 AM

you can only work lock free safely on structures that work with plain primitive integers. what are you going to do with that ? you need to push function objects, so unless you work out a complex permanent memory position pooling technique and your lock free structures manipulate indices in that pool only, you have no exit.

read this document:


you'll realize that



also, without condition variable, lock free also means wait free, and thus you're going to need to embark in some crazy scheme where you need to recycle your job executor thread, into something that can make useful stuff during the moments there is no job; otherwise you need to spin and generate heat for nothing. Recycling a thread so that it can metamorphose into your rendering loop for example, would not be something I'd recommend to do. So why try going lock free when nothing said you would gain any perf than when using mutex+CV ? not mentionning much easier invariants proofs, better power usage in case of bubbles, software engineering sanity, etc...

#5237052 Beginner Advice: 2D Sidescroller Graphics

Posted by on 27 June 2015 - 01:57 AM

Nowadays you don't need to be a programmer to make a game, I saw a presentation about unreal engine blueprint by an artist who did his game about somebody crash landing on an alien planet, and has to contraptions assembling stuff around to survive.

You could check that out.

Or you could use another engine like Unity and code stuff in callbacks, like update-entity style callbacks. this is C# (or javascript), pretty nice to develop fast.

Or, you could use a C++ engine, like for example sfml, it is not really an engine but enough for a scroller. You could use a 2d engine, it seems there are zillions of those



Or make everything from scratch, which is not necessarily evil either, because you know everything of what is happening, you control your game from the ground up, and when we are talking small games like scroller, this is a good-thing™. Especially if you are in the category of people who (like me) have difficulties to use already existing systems because we don't know 100% of what they do under the hood. Personally it used to cause me mind-freezes lol. Now I've come to realize that people usually write stuff the way we expect for a given purpose so I can guess much better how to use foreign stuff. But I needed experience in DOING this stuff before I could use it.


So, if you're a flexible person who is not blocked by opaque things and can leverage existing libraries, go this way; if you feel you need to be in control of every line of code, start from scratch it will be a wonderful lesson for the future.


You can check my platformer demo, everything is open source, so you can really see how everything is done, from scratch to game.

#5237047 what is best way to check a code is more effiecient and runs faster than others

Posted by on 27 June 2015 - 01:16 AM

I wrote a piece of "article" for an answer on stackoverflow about benchmarking, it can be found here:


Of course only the first part is of any interest to you.


Otherwise, I'd say you can always predict performance by hand, using a "math" model of the machine, but it is so difficult and tricky that you might as well consider it impossible.

Do do a performance analysis on paper, you'd need to know the exact binary form of the program (op codes, or de-assembled op-codes), then you'll need to know how your cpu is going to treat this, what instructions goes into what pipeline (which out of order execution parallel pipe), and the exact state of the caches, which is hard.


Please consider this article:



The not-crazy-way™ is to measure it. You profile it in situation, or simply time it between start and end. Sampling profilers will give you details about where are the hot spots. Which means usually what function is called the most in your program. Then you can try different implementations for it, and profile again, if you lost time, revert code, if you gained speed, good. I'd say basically that's it.

#5236556 massive allocation perf difference between Win8, Win7 and linux.

Posted by on 24 June 2015 - 07:46 AM

Hello, maybe you've seen my topic about the open address hash map I provide, this topic is about a mystery in its performance benchmarking.


I've noticed something plain crazy, between machines and OS, the performance of the same test, is radically in opposition.

Here are my raw results:




(full source code here)

Ok no graphs, you're all big enough to look at numbers. These results are all issued from the same code.

I have overloaded operator new afterward to count the number of allocations that each method resulted in, in the push test.

this is the result:


mem blocks

std vector: 42
reserved vector: 1
*open address: 26
*reserved oahm: 2
std unordered: 32778
std map: 32769

So my conclusion, purely from these figures, is that windows 8 must have a very different malloc function in the common runtime. I tried to use the same binary exported by the visual studio I had on Win7 and run it on Win8, I got the same results as the binary built directly by VS on Win8. So It has to be the CRT dll. Or, its the virtual allocator in the kernel that has become much faster.


What do you think, and do you think there is a scientific way to really know what is going on ?


Can you believe iterating over a map is 170 times slower on gcc/linux than on VS12/Win8.1 ? 'the heck ?? (actually for this one I suspect an optimizer joke)


ps: 32778 nodes comes from the fact that i push using rand().

#5235507 open address hash map

Posted by on 18 June 2015 - 09:50 AM

Hi guys, I also have a piece of code tooling to give out.



I found that it is not so obivous to find a simple, integrable, open address hash map to a C++ project without going into crazy setups or extensive build fixings, or godknowswhat™ (licensing issues etc., you name it).

First let's link two stackoverflow threads for reference about other libraries that does similar things: https://stackoverflow.com/questions/3300525/super-high-performance-c-c-hash-map-table-dictionary




1. In game development, caring about memory is a huge must. Notably one very good rule of thumb is to never allow fragmentation to go rampant.

2. we need associative containers in many algorithms.


So if 1. and 2. met, the child they would have together would be called open address hash map.

vanilla `std::unordered` just doesn't cut it, because it is closed address and therefore allocates per element. (node based containment.) A problem to be mitigated by the use of allocators or something called burst allocation. Please c.f. this document: http://igaztanaga.vosi.biz/allocplus/ So anyway, for those who want to avoid headaches, why not give my hash map a try ?

Here is the page



bests to all

#5179320 Thesis idea: offline collision detection

Posted by on 10 September 2014 - 07:46 AM

Frankly, in this age, if you arn't aware of the whole historical bibliography on some subject, then ANY idea a human can possibly have, has already been: at least thought about, possibly tried and potentially published if is has any value.

I hope you're not thinking about a phd thesis ? because I don't see how any of this world's academy would allow someone to enter a 3/4 years cycle of research on an idea that sounds of little utility, propsed by a person who has basically next to no knowledge of the field.


Sorry to sound really harsh, I just want to calm down the game. At least go read everything you need to before, other ideas will come up when reading papers, only to realize later that it was proposed as a paper 2 years later, that you will also read, and have another idea, that either happens to not work, or be covered by some even later paper, and this cycle goes on until, if you are lucky and clever, you finally can get your idea that will actually bring progress to the world's status of the research. BUT, a phd being in 3/4 years the chance is great that some other team will publish a very close work before you finish... Yep.


Good luck anyway :-)

#5179314 Deep water simulation

Posted by on 10 September 2014 - 07:35 AM

Of course it is the reason. Also you get another problem that is much worse :


your baked animation will be tiled and repeated !

Not only, it is very difficult to MAKE it tileable in space, but you must also make it repeatable in TIME, and those are 2 pretty crazy stuff to have correctly.


In the era of shader model 2 (c.f AMD render monkey, ocean sample), water was indeed made using baked animated noise.


Also, about huge textures, think about the bandwidth, not only the memory is great, but today's graphic cards are limited by memory bandwidth rather than raw ALU.

#5168379 How to pack your assests into one binary file (custom file format etc)

Posted by on 22 July 2014 - 08:42 AM

How about you put all of those into a zip using the zlib ? it has one of the best licenses over in the wild; and maybe it just solves exactly your packing problem, and provides compression as a cherry on top of the cake.

#5168377 Inter-thread communication

Posted by on 22 July 2014 - 08:27 AM

For what its worth, if anybody knows this software:


I'm the co-author of the opengl rendering together with christophe riccio (http://www.g-truc.net/).

So basically, the opengl preview viewports of Vue has a GUI-thread message producer; and Render-thread message consumer queue system, based on a dual std::vector that swaps during consumption (each frame); and each message "push" takes the lock (boost mutex + lock) and each consumption also before swapping the queue. It just so happens that in common situations, more than 200 000 messages are pushed by second, and it is by no way a bottleneck.

I just mean to say, if you are afraid of 512 locks per frame... there is a serious problem in your "don't do too much premature optimization" thinking process, that needs fixing.

I agree that its total fun to attempt to write a lock free queue, but, if it was production code, frankly not worth the risk; and plainly misplaced focus time.


Now just about the filter thing, one day, just to see, I made a filter to avoid useless duplication of messages, it worked but was slower than the raw, dumb queue. I idon't say its absolutely going to be the same in your case; just try... but in my case being clever was being slower.

#5142969 Cascaded shadow map splits

Posted by on 28 March 2014 - 07:36 PM

Cascaded shadow maps biggest problem is in the computation of the frustums of the cameras used to render the shadows. There are multiple kinds of policies;

The most common is surely the one that cuts the main cmera view frusutms into subparts according to distance and use a bouding volume of those slice to create an encompassing orthogonal frustum for the shadow camera.


There are lost efficiency in this scheme because of bounding volume of bounding volume so lots of shadow pixels end up off screen and never used. In other words you loose resolution in the visible zone.


Therefore some recent solutions using compute shaders to be able to determine the actual pixel perfect min depth and max depth percieved in a given image, then you can optimize the slices of the camera frustum to perfection making crazy high resolution shadows, especially in scenes a bit enclosed in walls.


There is another very simple policy for shadow frustums, just center the shadow camera on the view camera's position and zoom out in the direction of the light, each cascade zooming out a bit more thus logically encoding more distance in view space. But this has the problem of calculating shadows behind the view where they could be unnecessary.

I say could; because actually you never know when a long oblique shadows must drop from a high rise bulding located far behind you. this is why this simple scheme is also popular.


In my opinion; this is your scheme that fails. you should visualize the shadow zones by sampling red; blue and green to obtain this:


once you get this debugging will be easy.