Jump to content
  • Advertisement
Sign in to follow this  
Alexander Orefkov

Support for unicode identifiers

This topic is 967 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, Andreas.

Is it possible - add in lexer support for  identifiers with unicode class "letter" symbols?

I planned write little orm for database, and some fields has cyrillic symbols in his names.

On JavaScript, C#, or modern C++ we have not problem wrote "someObject.????? = 10", but on AngelScript it is impossible :(

 

 

Share this post


Link to post
Share on other sites
Advertisement

Anything is possible :)

 

Of course, it wouldn't be a trivial matter filtering out what Unicode characters are actually letters or not since there are more than a 1 million and more are added every year, but perhaps it doesn't really matter. I could simply make the tokenizer accept any byte with value above 128 as a valid byte for identifiers. Of course, this support would only be turned on through an engine property.

 

I'll look into it.

Share this post


Link to post
Share on other sites

I've implemented support for this in revision 2248.

 

You turn on the support for unicode in identifiers with engine->SetEngineProperty(asEP_ALLOW_UNICODE_IDENTIFIERS, true);

 

Regards,

Andreas

Share this post


Link to post
Share on other sites

Andreas, for my purposes, it works very well.

But there was a small problem.

Writing using non-English identifiers, together with English keywords frequently makes changing the keyboard layout.

Is it possible to make the registration of a callback that will be called when parsing identifiers to identify the key words?

Then I and my users to be able to use non-English translation for keywords.

Share this post


Link to post
Share on other sites
Alex, I don't think it is feasible. As an alternative, you could use an AutoHotKey script. A script that would replace the typed character sequence ":?" with "class", or ":?" with "function" and etc.

EDIT:
As for library methods, I can only think an additional layer of indirection would work-- a dictionary mapping translations to english. Edited by fastcall22

Share this post


Link to post
Share on other sites

Having the tokenizer call a callback for every identifier would probably impact the compiler performance quite a bit for everyone, even those who would not use the callback.  

 

Instead I suggest you modify the CScriptBuilder add-on to translate special identifiers to keywords before passing the script to the compiler. The CScriptBuilder already has logic for doing a pre-compile pass on the code script and change some things, so it should be quite easy for you to implement that on your own.

 

Alternatively you can customize the asCTokenizer to translate the special identifiers for you.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!