# problems in profiling

This topic is 4652 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've encountered some problems in profiling my application, a simple ray-tracer. If I profile it when in non-optimization compiler settings, it turns out that there a lot of geometric routines that take 2-4% of the time. If I turn on the optimizations flags, then the only time consuming functions are ray-primitive intersections, rays generation and two or three others, all with 10-20% of time. In the first case, I don't know what to optimize, because some simple functions like dot product (that sometimes is first in the list) hardly furthermore optimizable (I think) and the compiler would do a lot of optimizations, when asked to do them. In the second case, The first items in the list are ray-primitive functions, too much general indication: how can I optimize an already fast algorithm (trumbore-moeller)? My ray tracer is something like 3-4 (!) times slower than what it could do (many people told me that) so that's not advanced optimization stuff, something is wasting my time ;-) How could I detect where exactly I can optimize?

##### Share on other sites
about a million things. First, what sort of spatial partitioning algorithm are you using to decide what primitives actually get sent to the ray-primitive function? This answer will make the most difference.

Otherwise, you have to understand that in 'optimized' builds, things like dot product should get inlined and thus removed from the graph (but the compiler should be able to schedule better also). Profiling non-optimized builds isn't really useful in terms of deciding what to spend your time optimizing, but it does give good insight into how your program works.

Before you worry about anything though, your first concern is "Is my space partitioning alogorithm good?" Most people here, for raytracing, would use K-D Tree. I also have to ask, what compiler, profiler, primitives (and how many) are you testing with?

##### Share on other sites
Thank you for the answer. Well, first of all, I don't use spatial partitioning structures. I've planned to use kd-tree, but I have some difficulties in finding references about them, so I delayed it.
I'm trying to optimize now because many people said that even by brute force I should get, with my current scene, far less than a second per rendering, while it takes 2.25 s on a Athlon XP 2600+. (6 triangles, 4 spheres, 3 bounding spheres. One sphere is reflective, another refractive, adaptive anti aliasing on, without it takes 1.2 s.).

I'm aware that the compiler inlines many methods, and that's the main problem. Currently I'm using Dev-C++, so the compiler I use is gcc and the optimizations flags used are those of Dev-C++. The profiler is the one provided with the IDE (gdb?).

The profiler says that the ray-sphere intersection routine takes 25% of the time ad it is as follows:
real sphere::Intersect(const ray &r, real d, bool backfaces) const {	vector3 v(r.GetOrigin() - center);	real b = -(v^r.GetDir());	real det = (b * b) - (v^v) + sqradius;		if (det > 0.0)	{		real dist;				det = Sqrt(det);		real i1 = b - det;		real i2 = b + det;		if (i2 > Q_EPSILON) dist = (i1 < Q_EPSILON) ? i2 : i1;			else return INF;		return (dist < d) ? dist : INF;	}	return INF;}

I really don't know what can I do to double its speed....

##### Share on other sites
That code is definitely not the problem. You could cut its execution by 25% by using the SSE sqrtss approximation instead of Sqrt(x), but it's not that significant.

I could profile your code on my machine (AthlonXP 2400) and see if anything sticks out from the VC compiler. With that type of scene, partitioning is not really a great concern (Still a win though, I bet!)