From the screenshot it looks like most of the time with loading the bytecode is spent in the function asCReader::ReadUsedFunctions, more specifically with the calls to IsSignatureEqual.
A trivial optimization would be to switch the order of the comparisons in the conditions. For example:
for( asUINT i = 0; i < engine->scriptFunctions.GetLength(); i++ )
asCScriptFunction *f = engine->scriptFunctions[i];
if( f == 0 ||
!func.IsSignatureEqual(f) || // <-- make this the last check in the condition, since it is the most expensive
func.objectType != f->objectType ||
func.nameSpace != f->nameSpace )
usedFunctions[n] = f;
Going a bit further but still keeping it simple, I would look into rearranging the logic inside IsSignatureEqual(). For example:
bool asCScriptFunction::IsSignatureEqual(const asCScriptFunction *func) const
if( !IsSignatureExceptNameEqual(func) || name != func->name ) return false; // <-- change this so the name is compared first
Just these two simple changes has the potential of saving 1-2 seconds on your loading time.
Would you mind making the above changes and verifying if the improvement is worth it? If it is I'll have the changes checked in to the SVN for the next release.
A much more complex optimization can be done by storing the functions in a hash map so the lookup done in ReadUsedFunctions() isn't done linearly. This would require a considerable amount of work, and I'm not entirely sure how much time it would save (though I'd guess it would be in the range of 80% of the time).
If the trivial changes I proposed above are good enough, then it will probably not be worth it to spend this time to implement the hash map.
This blurriness with the interpolated samples, plus the fact that it was almost correct with point sampling, tells me that you're likely getting samples that are offset by (0.5,0.5) texels, thus each pixel will be rendered as an average of the 4 nearby texels.
Try adjusting the UV coordinates with (0.5,0.5) texels, or perhaps the screen coordinates with (0.5,0.5) pixels.
What is the difference between the two applications?
By the looks of the output, I have a feeling it may have something to do with the sampling of the font texture. Are you perhaps using point-sampling, i.e. no interpolation? Have you set up the transform matrices for pixel-perfect rendering? Specifically check the manual for how to adjust coordinates (if necessary) for pixel perfect sampling. It might be that you need to add (0.5,0.5) texels to the UV coordinates to get the best result.
Everything you've shown so far SEEMS to be correct, the problem is most likely a very specific little detail that we're not seeing.
I've yet to use DX11 so I can't say if the difference that behc mentioned would change anything. In my own code I use DX9 and the BLENDINDICES part of the vertex format uses the type D3DDECLTYPE_UBYTE4. I suppose this would be the same as DXGI_FORMAT_R8G8B8A8_UINT in DC11, but cannot confirm it.
You'll need to do some debugging. Let us know what the value of the channel is when you draw the 'testna' characters.
float4 PS( VS_OUTPUT input) : SV_Target
float4 pixel = textureDiffuse0.Sample(textureSampler0, input.Tex);
// Are we rendering a colored image, or
// a character from only one of the channels
if( dot(vector(1,1,1,1), input.channel) ) // input.channel should have the value vector(0,0,1,0) here (assuming ARGB order)
// Get the pixel value
float val = dot(pixel, input.channel); // with vector(0,0,1,0) the dot function will return just the value of the green channel
pixel.rgb = 1;
pixel.a = val; // the value of the green channel will be used for the alpha blending
The code I use can be found here (in case you haven't seen it before):
What value are you placing in the vertex' channel argument? It should be a vector with only one of the red, green, blue, or alpha channels set to 1 (or 255). Which channel to set to 1 is given in the .fnt file for the character that you're drawing.
From the first image all characters in the word 'testna' are in the green channel, so you'll want to use the value (0,0,1,0) assuming th order of the channels is (A,R,G,B).
Oops. The article in the manual is obviously missing an important piece. I'll have that corrected a.s.a.p.
Anyway, the factory function can for example be implemented like this:
static FooScripted *FooScripted::Factory()
asIScriptContext *ctx = asGetActiveContext();
// Get the function that is calling the factory so we can be certain it is the FooScript class
asIScriptFunction *func = ctx->GetFunction(0);
if( func->GetObjectType() == 0 || std::string(func->GetObjectType()->GetName()) != "FooScripted" )
ctx->SetException("Invalid attempt to manually instantiate FooScript_t");
// Get the this pointer from the calling function
asIScriptObject *obj = reinterpret_cast<asIScriptObject*>(ctx->GetThisPointer(0));
return new FooScripted(obj);
I implemented a special bytecode instruction for calling registered class methods with signature 'type &obj::func(int)' in revision 2147. With this there are almost no runtime decisions that has to be made when making the function call, so the overhead is greatly reduced.
In my tests, a script that accesses array members 20 million times took about 1.2 seconds without this new bytecode instruction, and with it it was reduced to about 0.4 seconds.
In comparison, the same logic implemented directly in C++, takes about 0.04 seconds. So the script is approximately 10 times slower than C++ now. With the use of a JIT compiler this difference should be reduced even further (though I haven't tried it).
I did experiment with implementing the opIndex call as direct access by inlining the call as bytecode instructions, and the performance was about the same as with the new bytecode instruction. For now I've decided to put this option on hold, as the effort to get it to work is too great compared to the benefits it will have.
It's been almost exactly 4 months since the last release. This is longest period I've gone without making a new release since this project started back in 2003. This does not mean that development has slowed down though, I merely restrained myself from releasing the code so I could fit more improvements into it. If you take a peek at the change log you'll see that this is by far the largest update too.
As always, when I change the middle version number there are some changes in the interface. The changes are not dramatic, but might require some minor code changes. The most dramatic change in the interface is the removal of the behaviours asBEHAVE_REF_CAST and asBEHAVE_VALUE_CAST (+implicit versions). These behaviours should now be registered as object methods with the names opCast and opConv, respectively. The implementation of the functions in the application doesn't change, just how they are registered with the engine.
I've improved the internal memory management so that less objects will be placed in the garbage collector. This translates into improved performance as less CPU will be required to clean up the dead objects in the GC. The change will perhaps not be too noticeable at runtime, but during recompilation of scripts and engine shutdown you should definitely see a significant improvement.
The script language has gotten some improvements too. The ternary condition operator can now be used as lvalue (as long as both options are lvalues of the same type). Script classes can implement the opCast and opConv operator overloads. Compound assignments can now be used with virtual properties too. Class members can be declared as protected, with the same meaning as in other languages.
There are of course enhancements in the add-ons too, but I'll leave the discovery of those to the reader.