Sign in to follow this  
hgoel0974

Unity 'External Code' taking up most of my CPU?

Recommended Posts

I've been implementing CPU based voxel ray casting on a BSP tree, in the end a node consists of about 30 voxels, however, the performance is terrible, I ran the program through the Visual Studio Community Edition CPU profiler and got the following output:

http://i.imgur.com/SbZIJCn.png

Since it just says that '[External Code]' is taking most of the CPU time, I'm not sure how to proceed. I can't identify many serious performance issues, and I find it hard to believe that the recursion could be the problem. My Ray-AABB intersection tests don't appear to be too slow either.

My code:
BSP Tree code (yes, I know the filename says Octree):
VoxelOctree.cs

GI_IntersectionTests.cs

GIWorld.cs

Any suggestions on how I should approach figuring out this issue would be appreciated.
The ray casting and tree traversal code is in the RayCast function in VoxelOctree.cs Edited by HimanshuGoel97

Share this post


Link to post
Share on other sites

Do you make a system call or call to a library function in there?

 

Edit:

Nvm, guess I can look.

 

EditEdit:

Or not. Where is VoxelTree? Oh, you linked it wrong, but I can just navigate to it. : p

 

Gonna guess that it's this:

BoundingBox b2 = new BoundingBox(Voxels.Values.ElementAt(i).Position - Vector3.One * vSide * 0.5f, Voxels.Values.ElementAt(i).Position + Vector3.One * vSide * 0.5f);

Can you break that out into a separate function and call it there, then check the profile again?

Edited by Khatharr

Share this post


Link to post
Share on other sites
Sorry about that, just noticed and fixed the link.
EDIT: Ah, a minute too late.

EDIT: The boundingbox class doesn't actually do any calculations on the data, but I'll give your suggestion a try in the morning.

Share this post


Link to post
Share on other sites

BoundingBox is your code, so I would expect it to show up in your call stack if it were the problem. And it's a struct, so it shouldn't be causing any heap allocations.

 

I would focus maybe on the dictionary values enumeration. Isn't there a setting that will show you symbol information for system calls? I know there is for viewing the callstack in the debugger, I don't know if there is for the performance stuff...

 

You're calling ElementAt four times, it seems you should be able to just call it once and store the result.

 

I don't see anything obvious in that function that would be causing any heap allocations (and possible performance issues), but .net can be sneaky that way. 

Share this post


Link to post
Share on other sites

I'm tempted to guess the allocation as well.  Calling "new" in C# is faster than C++'s global heap, but still not exactly speedy.

 

It looks like the allocator is being called a lot.  Every time you have Voxels !=null, you loop through Voxels.Values.Count calls to the allocator to make a new one.  How many voxel values do you have?  How many times do you call RayCast?  Multiply the two, that's about how many times you do the allocation.

 

My hunch is that you are hammering the allocator and constructor with many millions of calls for what looks like about 10 seconds. 

Share this post


Link to post
Share on other sites

Ok, it looks like (from my own tests) calling ElementAt on the Dictionary's Values collection is really slow. I'm guessing it is an order-n operation that needs to iterate through items one by one, since the larger the index the longer it takes.

 

You probably want to use foreach instead (and hopefully the ValueCollection's iterator is a value type, to avoid unnecessary heap allocation)

 

In the little test I did, for a Dictionary of 10,000 items, this:

 

            for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values.ElementAt(i).a;
            }
 
was more than a 1000x slower than this:
 
            foreach (Blob blob in blobs.Values)
            {
                value += blob.a;
            }
 
Edited by phil_t

Share this post


Link to post
Share on other sites

BoundingBox is your code, so I would expect it to show up in your call stack if it were the problem. And it's a struct, so it shouldn't be causing any heap allocations.

 
There is a common misconception that structs must live on the stack rather than the heap.
 
C# makes a distinction between "reference types" and "value types", but that does not mean doesn't affect memory allocations or heap allocations.  C# works very hard to avoid specifying exactly where things live. The data stack and the heap are both implementation details for the language.  The specification is vague about the heap, allowing for there to be many heaps, and for heaps to be garbage collecting or not, 

 

The C# specification notes that "structs are value types and do not require heap allocation", but that does not mean they require stack allocation.

  
Structs are created on the heap all the time in C#. Structs can be placed on the stack, but they also can be placed on the heap. Structs that are boxed, structs that are a field of a class, structs not closed in a lambda, structs in an iterator or foreach block, these are all probably on the heap. A raw variable may be on the stack or the heap. 
 
The runtime has even more options. A struct's members can be pulled out into registers during JIT or ngen compilation.  Unlike C++, these details don't matter as much in C#.  The distinction between "reference types" and "value types" is generally sufficient.

Share this post


Link to post
Share on other sites

Agreed that struct doesn't necessarily mean it's allocated on the stack, but in the OP's usage scenario it will. 

 

You basically have:

    public struct BoundingBox : IEquatable<BoundingBox>
    {
        public Vector3 Min;
        public Vector3 Max;

        public BoundingBox(Vector3 min, Vector3 max)
        {
            this.Min = min;
            this.Max = max;
        }
    }

and then in the function:

    BoundingBox b2 = new BoundingBox(Voxels.Values.ElementAt(i).Position - Vector3.One * vSide * 0.5f, Voxels.Values.ElementAt(i).Position + Vector3.One * vSide * 0.5f);

In this case, the new is not allocating on the heap. I looked to see somewhere that b2 might be boxed (say, by an implicit cast to IEquatable<BoundingBox> ), and couldn't find any.

Edited by phil_t

Share this post


Link to post
Share on other sites

 

Ok, it looks like (from my own tests) calling ElementAt on the Dictionary's Values collection is really slow. I'm guessing it is an order-n operation that needs to iterate through items one by one, since the larger the index the longer it takes.

 

You probably want to use foreach instead (and hopefully the ValueCollection's iterator is a value type, to avoid unnecessary heap allocation)

 

In the little test I did, for a Dictionary of 10,000 items, this:

 

            for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values.ElementAt(i).a;
            }
 
was more than a 1000x slower than this:
 
            foreach (Blob blob in blobs.Values)
            {
                value += blob.a;
            }
 

 

 

Just a FYI:

 

Dictionary, List, possibly a few others (except for Collection<T>) all have struct enumerators and explicitly implement the GetEnumerator() so the GetEnumerator that actually will be called in that foreach will return the struct enumerator and not do any boxing. 

 

With that said though, ElementAt is an extension method for IEnumerable<T> so the value collection is going to be treated as an IEnumerable<T> and the explicitly implemented GetEnumerator will be called, which means the struct enumerator will be boxed.

Share this post


Link to post
Share on other sites

I'm guessing Garbage Collector. With a callstack so deep, the GC may end up deciding it needs to run; whether because of heuristics, or because it actually needs to (running out of heap space?).

 

Is there an option for VS to show you the actual external code? I think it's a setting somewhere. Maybe this setting also affects the profiler so you can see what External Code means.

 

From MS:

The [External Code] entries indicate time spent in the platform and runtime on behalf of our code doing work such as rendering the UI, initializing the app, and garbage collection.

 

Share this post


Link to post
Share on other sites

 

Is there an option for VS to show you the actual external code? I think it's a setting somewhere. Maybe this setting also affects the profiler so you can see what External Code means.

 

 

 

In vs2013 at least, it does indeed affect the profiler. (There's also a toggle in the main profiler summary view that lets you switch between "all code" and "just my code"). Running the sample code I posted above results in this (first number is inclusive samples): 

 

 

PerfTest.Program.Main(string[]) PerfTest.exe 177 0 100.00 0.00
    System.Linq.Enumerable.ElementAt(class System.Collections.Generic.IEnumerable`1<!!0>,int32) System.Core.ni.dll 177 10 100.00 5.65
        System.Collections.Generic.Dictionary`2.ValueCollection.Enumerator.MoveNext() mscorlib.ni.dll 141 141 79.66 79.66
        [Unknown] [Unknown] 26 26 14.69 14.69

 

The implementation of ElementAt calls MoveNext n times to locate the item at index (unless the IEnumerable can be cast to IList, which Dictionary can't). So it's a slow way of iterating through a dictionary's values.

 

Not sure what the call to [Unknown] is in ElementAt. Maybe that's the gc kicking in.

Share this post


Link to post
Share on other sites

Could you just change this

 for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values.ElementAt(i).a;
            }
 
to
 
 for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values[i].a;
            }

Share this post


Link to post
Share on other sites

Sorry for the late reply, today has been a busy day at university.

I made the changes you guys recommended, namely, switching out the elementat call for an array access and moving the BoundingBox 'new' call outside of the loop. These two changes have indeed helped increase performance but it still isn't up to the point I was thinking voxel raycasting would be at. I have the following output on the profiler now, the GIWorld.Update call doesn't even show up on the list, I'm looking into how to get the External code portions to show more detail though.

http://imgur.com/dLwm01m

 

Additionally, it seems there's a bug in the actual raycasting code, as a result of which the results are incorrect. I was hoping to use voxel raycasting on the CPU to do an additional low resolution indirect lighting bounce and then upload that to the GPU for compositing, which is beginning to seem like a not too great idea.

Edited by HimanshuGoel97

Share this post


Link to post
Share on other sites

Instructions to show you what's in the external code section: http://stackoverflow.com/questions/33482789/external-code-in-vs2015-profiler

 

When you have the diagnostic view up, look for a dropdown that says "Filter View". It's in the area below the graph but above the listview. Click the dropdown and check the "Show External Code" checkbox.

 

Also see https://msdn.microsoft.com/en-us/library/dn971856.aspx

Edited by Adam_42

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this