Why does this take so long?

This topic is 3672 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I have a ray tracer that renders a simple scene in under a second in release mode, but 25 seconds in debug mode - which is a pain. So I profiled to see what checks were being put in that were taking so long. A full 12% of the time is being spent in an accessor to a vector, which basically looks:
class Vector
{
float elems[3];

public:
const float_t& operator[](int index) const
{ // 12% time of entire program being spent on this line
return elems[index];
}
}
Disassembly of the prologue to the funtion (second column is percentage of time spent on that line, relative to the entire program):
push ebp                	0.2
mov ebp,esp             	0.05
push byte ffh           	0
push dword 00426fd5h    	0.21
mov eax,fs:[00000000h]  	0.04
push eax                	0
sub esp,00000128h       	0.16
push ebx                	0.05
push esi                	0
push edi                	0.22
push ecx                	0.09
lea edi,[ebp-00000134h] 	0.06
mov ecx,0000004ah       	0.14
mov eax,cccccccch       	0.03
rep stosd es:[edi]      	10.24
pop ecx                 	0.21
mov eax,[004302a8h]     	0.16
xor eax,ebp             	0.01
push eax                	0.24
lea eax,[ebp-0ch]       	0
mov fs:[00000000h],eax  	0
mov [ebp-14h],ecx       	0.34
Obviously the rep stosd es:[edi] is the line primarily responsible. I'm pretty sure that this is storing a string - what is this string? Having a poke around in memory doesn't reveal anything meaningful, although I guess it could be unicode.

Share on other sites
Iterator debugging.

Profiling debug builds is pointless anyway. It's like trying to estimate your sprinting performance in a pool.

Share on other sites
Quote:
 Original post by AntheusIterator debugging.

I wondered about this, but I don't think it's the case: firstly, this is a direct index into an array, which I don't think gets checked; secondly, disabling iterator debugging has no effect. Thanks for the suggestion, though.

Quote:
 Profiling debug builds is pointless anyway. It's like trying to estimate your sprinting performance in a pool.

I know that (and originally included a disclaimer in the first post, but edited it out for the sake of brevity): it's just that the slowdown in the debug release makes debugging a little bit of a pain, and so I was specifically looking to find the check that's making it take so long and, if possible, avoid it. Also, it's sufficiently significant that I was simply curious. :-)

Share on other sites
Quote:
Original post by TheUnbeliever
Quote:
 Original post by AntheusIterator debugging.

I wondered about this, but I don't think it's the case: firstly, this is a direct index into an array, which I don't think gets checked; secondly, disabling iterator debugging has no effect. Thanks for the suggestion, though.

Ugh, I assumed it was dealing with iterators in STL fashion. Then it's not that obviously.

Share on other sites
That's not doing anything with a string. It's initializing 296 bytes in the stack to contain cch, which is basically a pattern "1010101010101010..." in binary. I don't know why it's doing that (some mechanism to detect out-of-bounds writing?).

Share on other sites
Here's what's going on:
Lines 1..2: setting up a stack frame
Lines 3..6: build SEH record on the stack
Line 7: reserving space for local variables (why so much? what local variables?)
Lines 8..10: saving registers
Lines 11..16: initializing local variables to 0xCC
Lines 17..22: install SEH handler

Quote:
 It's initializing 296 bytes in the stack to contain cch, which is basically a pattern "1010101010101010..." in binary. I don't know why it's doing that (some mechanism to detect out-of-bounds writing?)

VC always does this in debug mode; the idea is to expose the use of uninitialized local variables. (0xCC..CC is a rather unusual value for ints and floats, so you quickly notice; more so than if the memory were zero-initialized).

Share on other sites
The profiling data can show a lot of time being spent in your operator[] purely because it gets used a lot. It by no means proves that it is slow.
Personally I would not trust profiling data that gives you time-spent per assembly instruction. Profiling at that level cannot be both fast and accurate. (Just look at the variation on those 'push' lines)

However, one thing that will likely speed it up is to change it to return by value instead of by const reference. Leave the non-const version to return by reference though of course.
Also, what is the difference between float and float_t? If float_t is not simply a typedef for float then there is more going on there that may take some time. If not, then you may as well make them the same anyway.

Rather then focusing on the operator [] you need to focus on the things that call operator [] and reduce those. Post such code if you need help with it.

Share on other sites
To remove that setting to 0xcc under the C/C++ code generation settings turn off "Basic Runtime Checks".

It's also possible (although more difficult since instructions get reordered, and some variables end up in registers) to debug release compiled code. Just make sure it's compiled with debug info on.

You can also control optimization settings on a per file basis, so it might be worth using the release build, but with optimizations disabled for the files that you need to debug. There's also #pragma optimize for even more control.

There's also a simple (but somewhat nasty) way to optimize the calling code to work round that slow operator. Let's assume you have code like:
for (size_t i=0; i < somevector.size(); i++){   float_t f = somevector;   // Do stuff}

Assuming operator[] does return by reference replace it with:
const float_t *array = &somevector[0];const size_t len = somevector.size();for (size_t i=0; i < len; i++){   float_t f = array;   // Do stuff}

This probably won't have any effect on the speed in release mode, but it'll help a lot in debug as functions won't get inlined there.

Share on other sites
Quote:
 Original post by Jan WassenbergLine 7: reserving space for local variables (why so much? what local variables?)

There are no local variables at all.

Quote:
 Original post by iMalcAlso, what is the difference between float and float_t? If float_t is not simply a typedef for float then there is more going on there that may take some time. If not, then you may as well make them the same anyway.

float_t is a typedef, passed in through a template - which all happens at compile time, obviously. I'm not entirely convinced that there's any utility of having a template for it, though; I might just use floats and be damned.

Quote:
 Original post by iMalcRather then focusing on the operator [] you need to focus on the things that call operator [] and reduce those. Post such code if you need help with it.

I don't think they're too sub-optimal, but nor are they particular optimized:

		boost::optional<float_t> intersect(const Ray<float_t>& ray)		{			typedef Maths::Vector3<float_t> Vector3;			Vector3 d = ray.from - centre;			float_t a = Maths::dot(ray.direction, ray.direction);			float_t b = 2 * Maths::dot(ray.direction, d);			float_t c = Maths::dot(d, d) - radiusSquared;			float_t discriminant = b*b - 4*a*c;			if (discriminant < 0)			{				return boost::none;			}			else			{				float_t t1 = (-b + std::sqrt(discriminant)) / (2*a);				float_t t2 = (-b - std::sqrt(discriminant)) / (2*a);				float_t tmin = std::min(t1, t2);				return (tmin < 0 ? std::max(t1, t2) : tmin);			}		}// ...		boost::optional<float_t> intersect(const Ray<float_t>& ray)		{			if (Maths::dot(ray.direction, normal) == 0)				return boost::none;			float_t a = normal.x();			float_t b = normal.y();			float_t c = normal.z();			float_t d = -Maths::dot(normal, point);			float_t t = -(a * ray.from.x() + b * ray.from.y() + c * ray.from.z() + d);			t /= a * ray.direction.x() + b * ray.direction.y() + c * ray.direction.z();			return t;		}

Share on other sites
I presume that things like Math::dot call your operator []. I would tend to put things like dot in your vector class instead where it can access the array directly.
You could consider making Math a friend of Vector and accessing the array directly that way. Not sure if I'd be totally comfortable with that though.

You could also consider using the geometric method of finding the intersection between a ray and a sphere. I think it has better early-out opportunities, which again could also reduce calls to operator [].

I really hope the compiler is being smart enough to reduce the two sqrt calls into one. If not, that'd make a reasonable dent on the speed of that function. I would explicitly refactor out the common sub-expression to be sure.

1. 1
2. 2
frob
15
3. 3
Rutin
12
4. 4
5. 5

• 13
• 12
• 58
• 14
• 15
• Forum Statistics

• Total Topics
632123
• Total Posts
3004236

×