  
Welcome to Ventspace! Most posts here are delayed copies of posts from the real Ventspace.
 How to Choose Sampling Frequency? |
Posted - 7/27/2009 11:14:44 PM | So I was wondering, how do you decide at what frequency to sample when you're building a sampling based profiler?
No, seriously. What are you supposed to punch in? I'm taking samples 4ms apart because it sounds less like a bullshit number than 1 or 5. I have literally no idea how to choose. Lower intervals obviously slow things down, and higher intervals probably improve accuracy. I even use timeBeginPeriod to ramp the scheduler resolution up, but I still have no idea how often to check.
| |
 Moar SlimTune |
Posted - 7/20/2009 1:37:08 AM | Primarily due to Washu request!



I'd like to point out that it can see native functions, too.
| |
 Export and Import Interoperability |
Posted - 7/18/2009 6:35:00 PM | I've decided that "Windows API Code Pack" is far too long to say/write in casual conversation. WACP it is.
These are terms I came up with a few weeks ago, and I wanted to document them properly. I feel that they're good descriptions of what a wrapping API like SlimDX allows, and it's useful to be able to settle on common jargon. The basic idea of interoperability is for libraries to be able to cooperate, by sharing objects with each other.
Export interoperability is the ability to "export" objects to other libraries. In the case of SlimDX and similar wrappers, it essentially involves exposing the internal IUnknown pointers, so that another system that supports importing objects can do so. It's not difficult to implement, but it is critical to remember that when your objects are exported, their state can be changed at any time, outside of your control. You can't cache anything that isn't invariant. This is why XNA is unable to do export interop; they chose to cache just about everything, and allowing people to use the underlying interfaces directly will break it. People do it anyway via reflection of course, but it's risky. SlimDX and WACP don't cache anything about the objects, and support export interop cleanly. This is why we can work with libraries like DirectShow.NET, CUDA.NET, and more.
Import interop is of course the ability to consume objects from another library. Of the various DirectX wrappers, SlimDX seems to be the only one that handles import interop. There's no particular reason WACP can't do it, as far as I can tell; they simply haven't added it to the public interface. With XNA, I believe that the cached values are again the problem, I suspect. They could be looked up on construction, but somebody else still has direct access to the interface. Import interop is at its core not that clever, because it's a basic part of building the wrapper in the first place. You have to be able to convert pointers from the unmanaged API to your own objects, so it's not a big step to do it from arbitrary pointers. The trick is doing it safely; you have to trust that the pointer you're given is what the caller says it is. SlimDX assumes it is an IUnknown and then uses QueryInterface to get the desired type. This is our only line of defense, but it's a fairly effective one.
Interoperability has been a major focus for the SlimDX design, for both import and export. There's a fair amount of complexity involved in object construction, but it's been carefully laid out to be able to handle external pointers. There was some internal caching of values early on, which we thought were invariant, but we were seeing some difficult to track problems and so we backed out the code to a safe implementation. Our commitment to making sure we can both export and import objects goes far beyond the other major wrappers, although WACP should be able to provide equivalent functionality if they stick to the current design or similar.
| |
 Remote Profiling will NOT be secure in SlimTune |
Posted - 7/17/2009 5:27:40 PM | At least, not to begin with. There are some drawbacks to not being a security professional, one of which is that I have neither the qualifications nor the experience to do a proper security analysis of the profiler backend. Since I can't audit the backend for security, it will be considered insecure, and that's that.
The practical result of this is that allowing uncontrolled remote connections to the profiler will be incredibly dangerous. I am planning to include a setting that disallows connections except from localhost. However, if you are actually using remote profiling on something that might be attacked, it's critical to make sure it is behind a firewall that will not allow arbitrary connections.
Eventually it should probably allow you to set a username and password for connections, but that's again something that takes some care to implement properly and I'd rather not be the one doing it. In any case, that's functionality which will come much later. Sorry if secure remote profiling is high on your list.
| |
 And so it begins... |
Posted - 7/16/2009 12:07:23 AM | 
| |
 SlimTune Profiler for .NET |
Posted - 7/14/2009 2:40:52 PM | I basically took last week off from blogging. Time to try and get some new entries out! Things have been very SlimDX focused, but what did you expect? It's what I do. Maybe today's will inspire a bit more general interest.
As a creature of GameDev.Net, I get to see lots and lots of discussions questioning whether or not C# and .NET are "fast enough" for games. What I don't much is people working on actually analyzing and tuning the performance of their .NET code to see what's going on. I'm not sure why this is, but I have a theory that it's partly because of the sorry state of available performance tools. The only version of VS that has a profiler is Team Edition, which damned near nobody has. Other commercial offerings are also seriously expensive. There are only two free profiling tools that are really available for use: CLR Profiler and NProf. (I've seen a few other tools, but it's clear that they're fringe tools that aren't well supported.)
CLR Profiler is written by Microsoft, and it's a pretty good tool. They've even released the source code, although the licensing is vague. It has a few drawbacks though. First of all, it only does memory profile analysis. It does a very good job of tracking allocations and garbage collections, and the visualizations are very well done too. But that's all you get -- no timings of any kind, let alone a breakdown of where time is being spent. Also, it hasn't been updated since late 2005.
Then there's NProf. Oh dear. The good news is it works, barely. That bad news is that's all the favorable comments I have about it. It does simple sampling based profiling only, and will show you a simple tree based breakdown of time spent. It's not that NProf is useless; I've done lots of good performance tuning with it. But this is literally all it can do, and there's a lot more you want from a profiler. The last release was December 2006, and there's some scattered SVN traffic since then but it's basically dead. Support for x64 is apparently doable if you compile from source. I looked at the source, which is also poorly written. I decided immediately that I could do better than this toy, and now I'm putting my money where my mouth is.
I'm working on a new open source profiler tool right now called the SlimTune Profiler. It will probably release in early September, and the initial feature set is taking direct aim at NProf. The initial version will support sampling and instrumentation profiling for .NET 2.0 and above on local and remote machines. A little later on, you'll be able to profile-enable a long running process at zero performance cost, and then profile it in real time for short periods. Imagine running a production server, and actually connecting with the profiler while it's serving real requests to see what's happening.
On the front-end, data will be collected from the profiling backend and dropped into an embedded relational database. There will be some preset views of the data, but the idea here is that you should be able to apply your own queries to the data and get results that are useful to you. Reporting is not expected for the initial version, but it will be supported eventually as well. I imagine you'll be able to create various tables, graphs, etc and export them, although I'm not sure exactly what format that'll be in. PNG and Excel seem reasonable. I'm hoping that you'll be able to combine results from multiple runs, which would allow you to make all kinds of snappy graphs to show off to your boss.
It's been my plan for some time now to expand beyond SlimDX, and create a suite of Slim software. We've got a good reputation and lots of respect for our work, and I'm looking to build on that. SlimTune is the first step. It probably won't be able to compete with the commercial offerings -- but RedGate ANTS runs $400 or more per license. SlimTune will blow NProf out of the water in a scant two months, and it won't cost you a dime. The feature set is pretty well specified, and the profiler already works in prototype form. The work over the coming weeks is in building a product instead of a project.
And yes, I know I'm a tease. It'll be worth the wait.
| |
 SlimDX Performance |
Posted - 7/13/2009 1:08:53 PM | I'm not equipped at the moment to undertake a full treatment of SlimDX's performance right now, as that's a rather touchy set of problems. However, I did want to provide a general overview of what kinds of things affect SlimDX performance and what you can expect from your own programs. Strictly from the library's point of view, performance is actually surprisingly good. It is not as good as unmanaged code -- and can never be -- but it is much better than you might think.
There are a few key sources of inefficiency in SlimDX, which are varied and complex. Some are avoidable, and some are not. Some are inherent to the process as a whole, and some are results of architectural decisions in building the library. The name "Slim" officially means that there's a very minimal barrier between you and DirectX. While this is basically true, there's still a lot details to be aware of.
The very process of exposing a native API to managed code is complex. Apart from converting the parameters themselves -- a process which SlimDX is exceedingly efficient at -- there is a lot of bookkeeping to do in making sure the call stacks stay correct, that permissions are handled properly, that exceptions are trapped safely, and so on. C++/CLI handles all of this for us behind the scenes, but there is a substantial cost for making ANY native call as a result. We have studied techniques for addressing those costs by allowing you to batch certain kinds of calls (eg SetRenderState), but nothing has yet shown itself to be clearly advantageous.
The most obvious candidate for performance issues is the math library. Floating point is well known for being performance critical in games (and certain other software) and that's why we have processor extensions like SSE to make it as fast as possible. The D3DX routines that Microsoft provides are heavily optimized, and take full advantage of processor extensions. Unfortunately, using these routines from managed code is very expensive, so we instead provide completely managed implementations. XNA takes the same approach, but MDX calls into D3DX. These managed functions will JIT into scalar SSE which, while not optimal, is still very fast. Rough benchmarks have shown that completely native D3DX code has about a 10% advantage over our implementation. In other words, if you spend 20% of your time just doing math, a native code version could be 2% faster in that respect. In order to get the best possible performance out of SlimDX -- or XNA, for that matter -- make sure to use the ref/out overloads of parameters. It won't get you Havok levels of math performance, but it's still extremely fast.
The vast majority of calls into the DirectX API itself return an HRESULT error code. In C++, you're free to check or ignore this as you please. MDX took the fairly aggressive step of checking every single return value and converting to an exception; SlimDX and XNA both follow essentially that same model. SlimDX is particularly powerful, because it allows you to trap specific failed results, to control what actually causes an exception, or to disable exceptions outright. There's also a LastError facility that stores a thread-local value for the last result code recorded by SlimDX. It's a lot of truly useful flexibility, but it comes with a cost. For the March 2009 release, we leaned out the error checking significantly, making it much, much cheaper to make short DirectX calls. For a successful result, you basically pay for a method call, a comparison, and a small write to thread-local storage. A failed result triggers a much slower path. In the end, we are occasionally slower than MDX, but usually not thanks to improvements in .NET 2.0 and our improved architecture overall. We're also nearly always faster than XNA on the CPU; XNA spends a lot of time doing thorough parameter validation.
Be wary of properties, especially ones that return complex value types. Many properties will cause a DirectX call on every invocation. If the return type is large, we're probably reading an entire struct back, and then copying it to your variable. The Description property on many objects, for example, causes a GetDesc call. If you check obj.Description.Width and obj.Description.Height, that is two calls to GetDesc and two full copy operations. Cache the return value if you're invoking DirectX a lot in performance sensitive code areas!
Anything in SlimDX that returns a ComObject derivative is typically translating from a native pointer to a managed object. In SlimDX, this invokes some internal machinery that will take a lock, do a table lookup, possibly insert to the table, and then construct an object if necessary. In the vast majority of cases this should never be an issue, but if you're getting lock contention on our object table there is potential for problems. We don't really have good data on this one either way, so it's probably safe to assume it's not a concern. The lock is only held for very short amounts of time.
Lastly, cache all of your effect handles when working with D3DX effects! Because these are string pointers internally, and D3DX doesn't support unicode for them, we have to allocate every time a handle is created. Passing a string instead of a cached effect handle will cause allocation, every call and every frame. The net effect on performance of passing strings directly is quite extreme, since it follows slow paths in both SlimDX and D3DX.
That's a somewhat high level overview, and I'm planning to add documentation that discusses SlimDX's performance characteristics in more detail. However, we've been completely silent on performance and I decided to at least rectify that in brief. If I had to summarize, I'd say that in the general case:
- XNA and SlimDX are both faster than MDX, and sometimes much faster.
- SlimDX is faster than XNA at specific tasks, but it's unlikely to make a noticeable difference for most people.
- SlimDX is still slower than native code, probably on the order of 5%-10% with respect to DirectX calls. So if 50% of your time is spent calling SlimDX/DirectX (VERY high), you could be running as much as 5% faster in native code in that respect, but 3% is a more realistic estimate. For sane applications, even less.
| |
 SlimDX Supports Direct3D 10.1, too |
Posted - 7/10/2009 5:19:03 PM | SlimDX has had Direct3D 10 support for a long time, and it's almost certainly the single best way to get 10 support from C# or any other .NET language. While it's true that the very original prototype didn't plan on it, that was over two years ago and it's very much a first class citizen. Although we have Direct3D 11 support now (see this post), 10 is still a priority and we're not leaving it behind.
As part of that commitment, SlimDX now has full Direct3D 10.1 support. It will be part of our next release, which looks like it will be August 09 at this point. Although nobody specifically asked for 10.1, it's a very useful API in its own right. Sure it adds some features, but just like DirectX 11, it has feature levels. In other words, even though DirectX 10 requires DX10 class hardware, 10.1 works on DX9 or 10 hardware! The caveat is that Vista SP1 or later is required, but if that's an acceptable limitation, 10.1 is great to have for that reason alone.
And as always, if you find missing features or bugs or whatever, let us know! D3D 10 is in use by several of our customers in production environments, and we're determined to continue being the single best option for using it from C#, VB.NET, or anything else in the .NET ecosystem.
| |
 The SlimDX Architecture, Part 3 |
Posted - 7/9/2009 11:15:42 AM | I asked Josh to write up a follow up to the first two parts, covering the ancillary object support in our ObjectTable, which he graciously agreed to. Here it is.
I want to switch gears a little bit and talk about a class you've almost certainly seen if you're doing any SlimDX work: DataStream. I'd describe ObjectTable and ComObject as the heart of SlimDX, and DataStream as the soul. It's been exceedingly difficult to get right, with 41 revisions of the header and 50 revisions of the source, even though the feature set has only changed a little bit over time. In fact, we just changed it again. It's critical to most people because it does the one thing everybody needs to do -- transfer data from the application to the API, and occasionally vice versa.
The original class in MDX was called GraphicsStream, and SlimDX used the same name for a while. We decided in r138 to rename the class to DataStream, since it was fairly obvious that it wasn't graphics-specific in any way. The name was never quite ideal, but MemoryStream is taken and we couldn't figure out something better. Besides, it's kinda catchy. Although the goal of the class was kind of vague at first, it fundamentally represents what a pointer has become in managed code. It's not quite as elegant in many ways, but unlike a pointer it is a hell of a lot safer.
The DataStream typically replaces a pointer in the underlying API, but a pointer in the unmanaged world is a relatively simple beast. In the managed world, we have a whole host of problems. There's a few different places that a user could want to copy data to or from: - An array on the managed heap
- A managed Stream
- A buffer on the native heap (owned by anybody)
- Memory in an ID3DXBuffer
In XNA, the approach to handling this complexity is to dispense with it entirely, and limit data transfers to managed arrays only. I criticized this decision heavily when the beta was released, but it's not necesarily an unreasonable one. I just didn't think it was the right move. It does fit in with the overall design strategy of XNA, after all. However, it doesn't fit the SlimDX design philosophy, so we have a somewhat different approach.
DataStream can do all of these things. That's why the name is so vague; trying to quantify the class at all is difficult. It exists to mediate data transfer between an application and DirectX, in either direction; it provides the standard Stream interface. Every DataStream has a backing store that can be stored in a pointer, but sometimes there's auxiliary members. (A pin handle or an ID3DXBuffer are currently supported.) You can even allocate memory off the native heap for use via one of the constructors, in case that's useful.
DataStream's code is very simple and easy to understand, but it's fraught with subtleties. The bounds checks have to be exactly right, which has been surprisingly difficult to do. It's also important to use the correct pointer; sometimes this is the beginning of the stream and sometimes it's the stream position. Despite those pitfalls, it's not a class that looks threatening...but it does power nearly all of the data transfer in SlimDX.
| |
 I am now a DirectX MVP! |
Posted - 7/1/2009 2:34:28 PM | I've just been awarded XNA/DirectX MVP status by Microsoft! I'm still poking around the new sites I've been given access to, but there's a lot of cool stuff here, courtesy of Microsoft. They're giving me MSDN and TechNet subscriptions, for example, which I'm thrilled about. I'm also eager to finally have access to the private MVP newsgroup, since it's not uncommon that I have questions related to SlimDX development that really need an answer from the DirectX team itself.
In short, this is pretty awesome.
| |
|
| S | M | T | W | T | F | S | | | | | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 11 | 12 | | | | 16 | | | 19 | | 21 | 22 | 23 | 24 | 25 | 26 | | 28 | 29 | 30 | 31 | |
OPTIONS
Track this Journal
ARCHIVES
October, 2009
September, 2009
August, 2009
July, 2009
June, 2009
October, 2008
June, 2008
May, 2008
April, 2008
March, 2008
February, 2008
January, 2008
December, 2007
November, 2007
October, 2007
September, 2007
August, 2007
July, 2007
June, 2007
May, 2007
February, 2007
|