Jump to content

  • Log In with Google      Sign In   
  • Create Account

SHilbert

Member Since 14 Jan 2000
Offline Last Active Jul 06 2012 10:53 AM

#4952098 Slow parallel code

Posted by SHilbert on 23 June 2012 - 02:43 PM

At the moment, I'm not working on instancing, and I'd really like to know why the parallel code experiences a sudden drop in speed.


You mentioned that the code chunk you posted takes 48% of the time in one test. Do you know for sure that it is the part that is growing out of proportion to the other part as x increases? Like, as you increase x does that percentage go up, or down, or stay the same?

I would time the first parallel loop, the second parallel loop, and your actual rendering code, all separately, and see which one is growing faster than the others as you increase x. You don't even need SlimTune, just use a System.Diagnostics.Stopwatch and draw the times, or percentages of frametime, on the screen. That way you can at least verify that you are targeting exactly the right section.

Also, if you are not doing it already, I would make sure you compare tests with the same percentage of objects visible -- either all, or none, or some constant value like 50%. If you compare 60% objects visible at x=27 to 40% objects visible at x=28 you will see changes in the relative timing of different sections that are only based on that percentage, not on x.

Also consider memory allocation and garbage collection. As you start allocating bigger chunks of RAM it starts getting more and more expensive, and the GC may start to thrash more and more often. One VERY SUSPICIOUS fact is that an allocation of 27^3 32-bit references is on the order of the size things start getting put in a separate large object heap (85,000 bytes.) I wrote a program to time how long it took per allocation averaged over 100,000 allocations, and I get this chart:


Posted Image

(EDIT: The graph is actually milliseconds for a pair of allocations -- first allocating an array of that size, and then calling ToList on it.)

Notice that the Y axis is milliseconds, though -- so if it is affecting you it is probably because the number of objects in your heap is much larger than my test program's, or you allocate many times per frame.

Maybe try and see how frequently GCs are happening. There are some performance counters that can tell you lots of details about this. GCs can strike anywhere in your main loop even if they're caused by allocations in a localized position so it would be useful to rule that out first.


#4951949 D3DXComputeTangentFrameEx help to decipher documentation

Posted by SHilbert on 23 June 2012 - 02:07 AM

Geez, that is some poorly worded documentation.

I think what it is saying is that if you pass NULL, then you get the situation described in the description column. As for the second column, I think the & !( ... ) is a horribly poor way of saying "bitwise anded with the bitwise inverse of ...." -- kind of weird they used ! instead of ~. That explains why "Vertices are ordered in a counterclockwise direction around each triangle." is next to "& !( D3DXTANGENT_WIND_CW )" -- D3DXTANGENT_WIND_CW is not set, so the ordering must be counterclockwise.

So basically none of the flags mentioned in that column are set by default. All of the flags have a nonzero value, so if you pass NULL for the flags value you will of course have provided none of the flags.

I don't think you want to specify all the flags since some of them are mutually exclusive, but I don't know what to recommend specifically. It does sound like you want to avoid passing D3DXTANGENT_CALCULATE_NORMALS though.


#4951944 Issues with Occlusion Queries

Posted by SHilbert on 23 June 2012 - 01:41 AM

Are only the occlusion test models cubes or are the actual models cubes as well?

Speed wise, occlusion queries only really make sense if the cost of doing the query is a lot less than the cost of just drawing the thing(s) it represents in the first place. For example, doing an occlusion query where you draw a simple bounding box (very cheap) in order to check if you need to draw an entire city block (very expensive if the geometry is complicated) is probably a good trade-off. Doing an occlusion query where you draw a single cube in order to avoid drawing another single cube, all of the above repeated up to 15,000 times, is probably not going to be effective. Even if you have more complicated models the sheer overhead of executing thousands of occlusion queries could make the speed suffer.

If you are trying for speed when looking at the entire scene, I would first say to look into an instancing strategy to achieve better batching. 15,000 draw calls is going to be inherently slow even if you are drawing simple things. Getting occlusion queries right is tricky. If you still want to do the occlusion queries, it would probably be more worthwhile to do 1 occlusion query for a group of many nearby models, by drawing the group's bounding box -- that way even if the cost of drawing a model is the same as the cost of a query you can potentially spend 1 draw to save 100 draws.

Some other miscellaneous thoughts:
  • Is there more to your drawing code? It looks like you draw your models inside the occlusion query Begin/End, which doesn't make much sense to me.
  • You don't want to be making occlusion queries unless you know the object in question is within your view frustum, otherwise the query is doomed from the start.
  • I'm confused why you are drawing a wireframe for your occlusion test object. You should probably be drawing a solid bounding volume or else you are going to get some cases where the occlusion query reports zero pixels when there really might be some. I can see why you would draw the final objects as wireframes though, just to see if the occlusion tests are actually working.
  • Might be worth reading what Shawn Hargreaves says in this thread on the XNA forums: http://forums.create...6244/33016.aspx



#4951849 Which is faster? std::priority_queue or std::multimap

Posted by SHilbert on 22 June 2012 - 04:01 PM

I honestly don't know what you are needing or looking for info on this, it's just a simple insert()...

pickedObjects.insert(double, Obj*); //Obj*just an address not new'd
//later on
pickedObjects.begin()->first to check the dist
pickedObjects.being()->second to assign that pointer to selected object....


Not sure what else I can say nothing complex here people....


I might be misinterpreting what you are talking about here, but if you are doing this for object picking, like having the user click in a window to select an object, and you only ever need to know the single object with the smallest distance, you don't need a priority_queue or a multimap. You just need to remember the currently closest object and its distance, and update those values if you find an even closer one.

Again, not sure if that is your use case, though.


#4823939 The GDNet Birthday thread

Posted by SHilbert on 15 June 2011 - 11:08 PM

If anyone makes floorcaek I swear I am not cleaning it up this time.


Posted Image


Floorcaek for everyone!


PARTNERS