• Advertisement
Sign in to follow this  

Unity 'External Code' taking up most of my CPU?

This topic is 680 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been implementing CPU based voxel ray casting on a BSP tree, in the end a node consists of about 30 voxels, however, the performance is terrible, I ran the program through the Visual Studio Community Edition CPU profiler and got the following output:

http://i.imgur.com/SbZIJCn.png

Since it just says that '[External Code]' is taking most of the CPU time, I'm not sure how to proceed. I can't identify many serious performance issues, and I find it hard to believe that the recursion could be the problem. My Ray-AABB intersection tests don't appear to be too slow either.

My code:
BSP Tree code (yes, I know the filename says Octree):
VoxelOctree.cs

GI_IntersectionTests.cs

GIWorld.cs

Any suggestions on how I should approach figuring out this issue would be appreciated.
The ray casting and tree traversal code is in the RayCast function in VoxelOctree.cs Edited by HimanshuGoel97

Share this post


Link to post
Share on other sites
Advertisement

Do you make a system call or call to a library function in there?

 

Edit:

Nvm, guess I can look.

 

EditEdit:

Or not. Where is VoxelTree? Oh, you linked it wrong, but I can just navigate to it. : p

 

Gonna guess that it's this:

BoundingBox b2 = new BoundingBox(Voxels.Values.ElementAt(i).Position - Vector3.One * vSide * 0.5f, Voxels.Values.ElementAt(i).Position + Vector3.One * vSide * 0.5f);

Can you break that out into a separate function and call it there, then check the profile again?

Edited by Khatharr

Share this post


Link to post
Share on other sites
Sorry about that, just noticed and fixed the link.
EDIT: Ah, a minute too late.

EDIT: The boundingbox class doesn't actually do any calculations on the data, but I'll give your suggestion a try in the morning.

Share this post


Link to post
Share on other sites

I'm wondering if it's the allocation is why I singled that out. I don't see any other external calls in that function, but I'm kind of sleepy as well.

Share this post


Link to post
Share on other sites

BoundingBox is your code, so I would expect it to show up in your call stack if it were the problem. And it's a struct, so it shouldn't be causing any heap allocations.

 

I would focus maybe on the dictionary values enumeration. Isn't there a setting that will show you symbol information for system calls? I know there is for viewing the callstack in the debugger, I don't know if there is for the performance stuff...

 

You're calling ElementAt four times, it seems you should be able to just call it once and store the result.

 

I don't see anything obvious in that function that would be causing any heap allocations (and possible performance issues), but .net can be sneaky that way. 

Share this post


Link to post
Share on other sites

I'm tempted to guess the allocation as well.  Calling "new" in C# is faster than C++'s global heap, but still not exactly speedy.

 

It looks like the allocator is being called a lot.  Every time you have Voxels !=null, you loop through Voxels.Values.Count calls to the allocator to make a new one.  How many voxel values do you have?  How many times do you call RayCast?  Multiply the two, that's about how many times you do the allocation.

 

My hunch is that you are hammering the allocator and constructor with many millions of calls for what looks like about 10 seconds. 

Share this post


Link to post
Share on other sites

Ok, it looks like (from my own tests) calling ElementAt on the Dictionary's Values collection is really slow. I'm guessing it is an order-n operation that needs to iterate through items one by one, since the larger the index the longer it takes.

 

You probably want to use foreach instead (and hopefully the ValueCollection's iterator is a value type, to avoid unnecessary heap allocation)

 

In the little test I did, for a Dictionary of 10,000 items, this:

 

            for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values.ElementAt(i).a;
            }
 
was more than a 1000x slower than this:
 
            foreach (Blob blob in blobs.Values)
            {
                value += blob.a;
            }
 
Edited by phil_t

Share this post


Link to post
Share on other sites

BoundingBox is your code, so I would expect it to show up in your call stack if it were the problem. And it's a struct, so it shouldn't be causing any heap allocations.

 
There is a common misconception that structs must live on the stack rather than the heap.
 
C# makes a distinction between "reference types" and "value types", but that does not mean doesn't affect memory allocations or heap allocations.  C# works very hard to avoid specifying exactly where things live. The data stack and the heap are both implementation details for the language.  The specification is vague about the heap, allowing for there to be many heaps, and for heaps to be garbage collecting or not, 

 

The C# specification notes that "structs are value types and do not require heap allocation", but that does not mean they require stack allocation.

  
Structs are created on the heap all the time in C#. Structs can be placed on the stack, but they also can be placed on the heap. Structs that are boxed, structs that are a field of a class, structs not closed in a lambda, structs in an iterator or foreach block, these are all probably on the heap. A raw variable may be on the stack or the heap. 
 
The runtime has even more options. A struct's members can be pulled out into registers during JIT or ngen compilation.  Unlike C++, these details don't matter as much in C#.  The distinction between "reference types" and "value types" is generally sufficient.

Share this post


Link to post
Share on other sites

Agreed that struct doesn't necessarily mean it's allocated on the stack, but in the OP's usage scenario it will. 

 

You basically have:

    public struct BoundingBox : IEquatable<BoundingBox>
    {
        public Vector3 Min;
        public Vector3 Max;

        public BoundingBox(Vector3 min, Vector3 max)
        {
            this.Min = min;
            this.Max = max;
        }
    }

and then in the function:

    BoundingBox b2 = new BoundingBox(Voxels.Values.ElementAt(i).Position - Vector3.One * vSide * 0.5f, Voxels.Values.ElementAt(i).Position + Vector3.One * vSide * 0.5f);

In this case, the new is not allocating on the heap. I looked to see somewhere that b2 might be boxed (say, by an implicit cast to IEquatable<BoundingBox> ), and couldn't find any.

Edited by phil_t

Share this post


Link to post
Share on other sites

 

Ok, it looks like (from my own tests) calling ElementAt on the Dictionary's Values collection is really slow. I'm guessing it is an order-n operation that needs to iterate through items one by one, since the larger the index the longer it takes.

 

You probably want to use foreach instead (and hopefully the ValueCollection's iterator is a value type, to avoid unnecessary heap allocation)

 

In the little test I did, for a Dictionary of 10,000 items, this:

 

            for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values.ElementAt(i).a;
            }
 
was more than a 1000x slower than this:
 
            foreach (Blob blob in blobs.Values)
            {
                value += blob.a;
            }
 

 

 

Just a FYI:

 

Dictionary, List, possibly a few others (except for Collection<T>) all have struct enumerators and explicitly implement the GetEnumerator() so the GetEnumerator that actually will be called in that foreach will return the struct enumerator and not do any boxing. 

 

With that said though, ElementAt is an extension method for IEnumerable<T> so the value collection is going to be treated as an IEnumerable<T> and the explicitly implemented GetEnumerator will be called, which means the struct enumerator will be boxed.

Share this post


Link to post
Share on other sites

I'm guessing Garbage Collector. With a callstack so deep, the GC may end up deciding it needs to run; whether because of heuristics, or because it actually needs to (running out of heap space?).

 

Is there an option for VS to show you the actual external code? I think it's a setting somewhere. Maybe this setting also affects the profiler so you can see what External Code means.

 

From MS:

The [External Code] entries indicate time spent in the platform and runtime on behalf of our code doing work such as rendering the UI, initializing the app, and garbage collection.

 

Share this post


Link to post
Share on other sites

 

Is there an option for VS to show you the actual external code? I think it's a setting somewhere. Maybe this setting also affects the profiler so you can see what External Code means.

 

 

 

In vs2013 at least, it does indeed affect the profiler. (There's also a toggle in the main profiler summary view that lets you switch between "all code" and "just my code"). Running the sample code I posted above results in this (first number is inclusive samples): 

 

 

PerfTest.Program.Main(string[]) PerfTest.exe 177 0 100.00 0.00
    System.Linq.Enumerable.ElementAt(class System.Collections.Generic.IEnumerable`1<!!0>,int32) System.Core.ni.dll 177 10 100.00 5.65
        System.Collections.Generic.Dictionary`2.ValueCollection.Enumerator.MoveNext() mscorlib.ni.dll 141 141 79.66 79.66
        [Unknown] [Unknown] 26 26 14.69 14.69

 

The implementation of ElementAt calls MoveNext n times to locate the item at index (unless the IEnumerable can be cast to IList, which Dictionary can't). So it's a slow way of iterating through a dictionary's values.

 

Not sure what the call to [Unknown] is in ElementAt. Maybe that's the gc kicking in.

Share this post


Link to post
Share on other sites

Could you just change this

 for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values.ElementAt(i).a;
            }
 
to
 
 for (int i = 0; i < blobs.Values.Count; i++)
            {
                value += blobs.Values.a;
            }

Share this post


Link to post
Share on other sites

Sorry for the late reply, today has been a busy day at university.

I made the changes you guys recommended, namely, switching out the elementat call for an array access and moving the BoundingBox 'new' call outside of the loop. These two changes have indeed helped increase performance but it still isn't up to the point I was thinking voxel raycasting would be at. I have the following output on the profiler now, the GIWorld.Update call doesn't even show up on the list, I'm looking into how to get the External code portions to show more detail though.

http://imgur.com/dLwm01m

 

Additionally, it seems there's a bug in the actual raycasting code, as a result of which the results are incorrect. I was hoping to use voxel raycasting on the CPU to do an additional low resolution indirect lighting bounce and then upload that to the GPU for compositing, which is beginning to seem like a not too great idea.

Edited by HimanshuGoel97

Share this post


Link to post
Share on other sites

Instructions to show you what's in the external code section: http://stackoverflow.com/questions/33482789/external-code-in-vs2015-profiler

 

When you have the diagnostic view up, look for a dropdown that says "Filter View". It's in the area below the graph but above the listview. Click the dropdown and check the "Show External Code" checkbox.

 

Also see https://msdn.microsoft.com/en-us/library/dn971856.aspx

Edited by Adam_42

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By Manuel Berger
      Hello fellow devs!
      Once again I started working on an 2D adventure game and right now I'm doing the character-movement/animation. I'm not a big math guy and I was happy about my solution, but soon I realized that it's flawed.
      My player has 5 walking-animations, mirrored for the left side: up, upright, right, downright, down. With the atan2 function I get the angle between player and destination. To get an index from 0 to 4, I divide PI by 5 and see how many times it goes into the player-destination angle.

      In Pseudo-Code:
      angle = atan2(destination.x - player.x, destination.y - player.y) //swapped y and x to get mirrored angle around the y axis
      index = (int) (angle / (PI / 5));
      PlayAnimation(index); //0 = up, 1 = up_right, 2 = right, 3 = down_right, 4 = down

      Besides the fact that when angle is equal to PI it produces an index of 5, this works like a charm. Or at least I thought so at first. When I tested it, I realized that the up and down animation is playing more often than the others, which is pretty logical, since they have double the angle.

      What I'm trying to achieve is something like this, but with equal angles, so that up and down has the same range as all other directions.

      I can't get my head around it. Any suggestions? Is the whole approach doomed?

      Thank you in advance for any input!
       
    • By devbyskc
      Hi Everyone,
      Like most here, I'm a newbie but have been dabbling with game development for a few years. I am currently working full-time overseas and learning the craft in my spare time. It's been a long but highly rewarding adventure. Much of my time has been spent working through tutorials. In all of them, as well as my own attempts at development, I used the audio files supplied by the tutorial author, or obtained from one of the numerous sites online. I am working solo, and will be for a while, so I don't want to get too wrapped up with any one skill set. Regarding audio, the files I've found and used are good for what I was doing at the time. However I would now like to try my hand at customizing the audio more. My game engine of choice is Unity and it has an audio mixer built in that I have experimented with following their tutorials. I have obtained a great book called Game Audio Development with Unity 5.x that I am working through. Half way through the book it introduces using FMOD to supplement the Unity Audio Mixer. Later in the book, the author introduces Reaper (a very popular DAW) as an external program to compose and mix music to be integrated with Unity. I did some research on DAWs and quickly became overwhelmed. Much of what I found was geared toward professional sound engineers and sound designers. I am in no way trying or even thinking about getting to that level. All I want to be able to do is take a music file, and tweak it some to get the sound I want for my game. I've played with Audacity as well, but it didn't seem to fit the bill. So that is why I am looking at a better quality DAW. Since being solo, I am also under a budget contraint. So of all the DAW software out there, I am considering Reaper or Presonus Studio One due to their pricing. My question is, is investing the time to learn about using a DAW to tweak a sound file worth it? Are there any solo developers currently using a DAW as part of their overall workflow? If so, which one? I've also come across Fabric which is a Unity plug-in that enhances the built-in audio mixer. Would that be a better alternative?
      I know this is long, and maybe I haven't communicated well in trying to be brief. But any advice from the gurus/vets would be greatly appreciated. I've leaned so much and had a lot of fun in the process. BTW, I am also a senior citizen (I cut my programming teeth back using punch cards and Structured Basic when it first came out). If anyone needs more clarification of what I am trying to accomplish please let me know.  Thanks in advance for any assistance/advice.
    • By Yosef BenSadon
      Hi , I was considering this start up http://adshir.com/, for investment and i would like a little bit of feedback on what the developers community think about the technology.
      So far what they have is a demo that runs in real time on a Tablet at over 60FPS, it runs locally on the  integrated GPU of the i7 . They have a 20 000 triangles  dinosaur that looks impressive,  better than anything i saw on a mobile device, with reflections and shadows looking very close to what they would look in the real world. They achieved this thanks to a  new algorithm of a rendering technique called Path tracing/Ray tracing, that  is very demanding and so far it is done mostly for static images.
      From what i checked around there is no real option for real time ray tracing (60 FPS on consumer devices). There was imagination technologies that were supposed to release a chip that supports real time ray tracing, but i did not found they had a product in the market or even if the technology is finished as their last demo  i found was with a PC.  The other one is OTOY with their brigade engine that is still not released and if i understand well is more a cloud solution than in hardware solution .
      Would there  be a sizable  interest in the developers community in having such a product as a plug-in for existing game engines?  How important  is Ray tracing to the  future of high end real time graphics?
    • By bryandalo
      Good day,

      I just wanted to share our casual game that is available for android.

      Description: Fight your way from the ravenous plant monster for survival through flips. The rules are simple, drag and release your phone screen. Improve your skills and show it to your friends with the games quirky ranks. Select an array of characters using the orb you acquire throughout the game.

      Download: https://play.google.com/store/apps/details?id=com.HellmodeGames.FlipEscape&hl=en
       
      Trailer: 
       
    • By khawk
      Watch the latest from Unity.
       
  • Advertisement