I started looking into writing my language service for Visual Studio 2010. The new extensibility SDK has a ton of stuff on adding features like that, so I decided to dive and start "SlimShade", an HLSL extension for VS 2010.
I knew that if I wanted to add all the features I wanted, I was going to need a full-fledged parser. I knew almost nothing about parsers, so I started looking around and doing some research. I stumbled across Irony, which is a fully managed generic parser that takes a grammar (also written in C#) and parses a given file. The simplicity of the project appealed to me over the larger and more obtuse parser generators like Yacc and ANTLR, so I decided to start messing around with it.
Irony is pretty cool, and the source code is fairly easy to follow, with a lot of sample grammars to look at. I used one as a base and started constructing the HLSL grammar. Unfortunately, HLSL doesn't have an official grammar published anywhere, so I had to "guess and check" a lot, looking at the (sometimes blatantly wrong) documentation and try to figure out all the rules. In the end, the best method I stumbled on was writing a quick utility to run through all of the shader and effect files in my DXSDK and NVIDIA SDK folders and try to parse them. When I got an error, I knew I had something else to fix in the grammar.
HLSL has blown up in complexity in the last few versions, and now has everything from namespaces to classes and interfaces. There are also a lot of little undocumented quirks in the language that I only discovered by running into them in official example source. For example, did you know that the following code is valid in HLSL?
float2 a, b;
float4(a, b) = mul(input, matrix);
Normally you can't use a constructor as an l-value like that (and the compiler throws errors in almost every case I tried), but in this case you can, presumably because it's acting tuple-like to update a and b with the appropriate values.
Eventually I managed to piece together a fairly complete grammar. An interesting thing to note here is that Irony is a LALR(1) parser, which means that it only has 1 look-ahead to determine the purpose of a given token. This makes it a fast and compact parser, but also makes it difficult to construct a grammar that has no ambiguities. Luckily, Irony comes with a Grammar Explorer app which will analyze your grammar and show you all the conflicts, and then give you a trace of the parser state to help you walk through and determine how to resolve things.
Irony is great as a stand-alone parser, but I need to be able to run it very quickly, in response to user input inside of Visual Studio. I'm not sure if Irony is even still being maintained anymore, but regardless I started to work on fixing a few issues I had run into and in improving the general performance.
Profiling showed the biggest hot-spots in the scanner portion of the parser, so I started focusing my efforts there. Most of the things I changed weren't huge issues by themselves, but were being called so many times during the process that they were slowing things down. For example, a certain section of code was calling HashSet.Contains a few million times during the parsing process to determine applicable terminals. I changed this to cache all possible sets before hand, and just use them straight from there. More memory, but faster during parse time, which is what I needed.
When I started, it took around 300 ms to fully parse a 15000 token file. Taking out a bunch of Irony features I wasn't using, I was able to knock that down to 250 ms. Switching to .NET 4 got me another 50 ms for free, which was a nice surprise. Since I do so many dictionary and hashset lookups, I went around and cached all the hash codes for my objects, which resulted in another couple dozen ms shaved off. Caching possible sets before hand, as I mentioned earlier, reduced it by another 40 ms.
At around 140 ms to parse now, I started to scrape the bottom of the barrel on things I could do. I inlined many frequently used properties and converted automatic properties into fields instead, which saved me around 10 - 20 ms. At this point garbage collections are showing up in the profiler as the bottlenecks, so I went around and reduced a bunch of allocations and changed a few objects into value types, which knocked another 30 ms off the time. Eventually I got the parse time for the same file down to 75 ms, which I decided was good enough, since most shaders won't be nearly that long (I think), and I have some plans to help split up and reduce the load later down the road.
At this point I started looking at example code and putting together my colorizer. The code is pretty straightforward, but getting it all to work the way you want is far from it. Since your extension is running inside Visual Studio, it's extremely hard to debug when something goes wrong, and a lot of the extensibility interfaces aren't documented that well or are completely unused anywhere else on the web, making figuring a lot of it out a painstaking process of trial and error. Eventually though, I was rewarded for my efforts:
I'm still thinking about and tweaking the colors and tagging, but the basics are all there.
Colorizing is nice, but I want really like Intellisense too, so I started work on that. Visual Studio actually recognizes four different types of Intellisense:
- QuickInfo - For tooltips when you hover over a variable, function, or type.
- Statement Completion - When you start typing, shows a list of possible symbols or keywords for the given scope.
- Parameter Info - Shows function signatures and overloads when you open a function's parenthesis.
- Method Info - Shows a list of members when you press the period key to access a member.
I've only worked on the first one so far, but getting QuickInfo to work was relatively painless, at least from the extension point-of-view:
Of course, getting the symbol information to display is another matter entirely. I still need to go through and collect user symbols and push them onto a the correct scope, and display them when necessary. For now, I've created an XML file that describes intrinsic symbols and gives them the nice info you can see in the image above. A snippet of that file:
"Submits an error message to the information queue and terminates the current draw or dispatch call being executed."
"Returns the absolute value of the specified value." profile="1" subset="vs_1_1 and ps_1_4">
"x" type="scalar,vector,matrix;float,int">The specified value.
"Returns the arccosine of the specified value." profile="1" subset="vs_1_1 only">
"x" type="scalar,vector,matrix;float">The specified value. Each component should be a floating-point value within
the range of -1 to 1.
"Determines if all components of the specified value are non-zero." profile="1" subset="vs_1_1 and ps_1_4">
"x" type="scalar,vector,matrix;float,int,bool">The specified value.
"Blocks execution of all threads in a group until all memory accesses have been completed."
profile="5" subset="Compute shader only"/>
this call." profile="5" subset="Compute shader only"/>
"Determines if any components of the specified value are non-zero." profile="1" subset="vs_1_1 and ps_1_4">
"x" type="scalar,vector,matrix;float,int,bool">The specified value.
"Reinterprets a cast value into a double." profile="5">
"lowbits" type="uint#1">The input low bits of the double.
"highbits" type="uint#1">The input high bits of the double.
Next I'll be working on statement completion. The parser already has a list of expected terms at any given point in the parsing process, so I'll just have to find a way to plug into that list and filter out anything unnecessary (such as delimiters).
Other Cool Stuff
There are a bunch of other things that go into building a language service, some of which I've played around with. The first is bracket matching, which ended up fairly easy to do since the parse is tracking this information already. Another feature was real-time error detection, which displays as squigglies in the editor view and in the Error List pane. Instead of relying on my parser to provide syntax errors, I opted to use the D3DCompiler to simply compile the source on a separate thread and parse the error messages returned. This lets me get warnings and semantic errors along with syntax and grammar errors, with less work for me, so I count that as a win.
Other ideas I had but haven't even started looking into yet: formatting, auto-indenting, supporting the navigation bar, symbol browser, showing real-time disassembly of the code, and even perhaps integrating a visual shader designer, although that would be a long way down the road at this point.
I still have a long way to go before I have something I can release for people to try out, but I think I've gotten a good start on this project. I'm hoping this will be useful for a lot of people, not just me. Anyone out there think they would use something like this? Someone already inquired about GLSL and Cg support, which should be possible in the future by simply plugging in a new grammar and tweaking a few things, so we'll see how that plays out.