Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

Is ASCII charset enough for an indie game?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 n3Xus   Members   -  Reputation: 464

Like
0Likes
Like

Posted 09 March 2013 - 09:41 AM

So if I release a game in which the user only has access to ASCII characters will this be a problem?

 

I haven't played many newer multiplayer games so I'm not sure what charsets they support.

 

 

For example do chinese gamers who play "English made" indie games play them in English or are these ignored on their market?

 

 

In my game the user will have to set keyboard shortcuts but so far this can only be done with ASCII characters.



Sponsor:

#2 Amr0   Members   -  Reputation: 706

Like
0Likes
Like

Posted 09 March 2013 - 11:11 AM

If your game is multiplayer and you wish to have international gamers playing it, then adding multi-language support to your game is the way to go. Can you get away without it? Maybe, but why would you want to? Using ASCII for keyboard shortcuts makes perfect sense, regardless of whether the game is multi-lingual or not. So going multi-lingual will not make your current solution for keyboard shortcuts stop working. Someone who has released an online indie game and has access to geographical information about the players as well as other information such as which languages players use to name themselves and chat would be able to answer you better. Anyway, you will probably find this to be helpful.



#3 MrDaaark   Members   -  Reputation: 3503

Like
0Likes
Like

Posted 09 March 2013 - 11:46 AM

I haven't played many newer multiplayer games so I'm not sure what charsets they support.

Make sprite sheet with whatever characters you want. Problem solved.

#4 n3Xus   Members   -  Reputation: 464

Like
0Likes
Like

Posted 09 March 2013 - 12:12 PM

Thanks for replies.

 

Daaark: yes thats how I have handled ASCII for now but I have manually mapped all texture coordinates to characters and I don't feel like doing that for all Chinese letters biggrin.png

 

How is mapping between a font texture and keyboard usually done/automated in a game?



#5 cr88192   Members   -  Reputation: 636

Like
0Likes
Like

Posted 09 March 2013 - 02:09 PM

Daaark, on 09 Mar 2013 - 11:52, said:

n3Xus, on 09 Mar 2013 - 09:47, said:
I haven't played many newer multiplayer games so I'm not sure what charsets they support.

Make sprite sheet with whatever characters you want. Problem solved.

in my case, my engine supports the full Unicode BMP (first 65536 characters, though not all of my fonts currently do this).
the way this is handled was by breaking the character space up into groups of 256, and these into a 16x16 grid.

if a character is drawn which doesn't yet have an associated texture, then a texture is created, and any characters in the range are drawn into the texture.

then, when drawing characters, we just use the appropriate texture and texture coords.
we can then treat the character bits as an index:
TT.Y.X: TT=8-bit texture number, Y=4-bit cell Y position, X=4 bit cell X position.

(the drawn text is then basically just texture-mapped arrays of triangles or similar).


all of my fonts are currently fixed-width bitmap fonts stored in a custom (binary) file format, but are generated initially using custom tools from a textual format (BDF):
http://en.wikipedia.org/wiki/Glyph_Bitmap_Distribution_Format

the binary format basically stores the fonts as a number of character spans, with a simple header giving character-cell width and height and similar.


theoretically, the texture strategy could be probably extended to support variable-width and truetype fonts (by rendering the fonts at different sizes, *1), but personally I haven't really been all that compelled by TTF support (nice, but not a huge feature). variable-width fonts are mostly a matter of varying cell-spacing when drawing the text, but need not actually require varying the width of the in-texture character cells.


*1: probable could be to render the truetype font as mipmap levels, using mipmapping as a font-level interpolation mechanism. FWIW, although it may seem like a person could just use a font rendered at a higher resolution (eg: 64x64) and then downsample it for all the lower resolutions, IME this often turns out badly (fuzzy and unreadable, yep, tried this before). generally, fonts rendered per-level work much better (then we just need 8/16/32/64 pixel versions, and everything else is interpolated).

potentially, a person could use a tool and batch-convert any fonts into sprite-sheets (rather than doing it all in-engine), and then use a format (such as DDS) which allows each mip-level to be specified individually. (although this strategy could require a fair chunk of HDD space to store the whole unicode BMP).

#6 cr88192   Members   -  Reputation: 636

Like
0Likes
Like

Posted 09 March 2013 - 02:37 PM

Amr0, on 09 Mar 2013 - 11:17, said:
If your game is multiplayer and you wish to have international gamers playing it, then adding multi-language support to your game is the way to go. Can you get away without it? Maybe, but why would you want to? Using ASCII for keyboard shortcuts makes perfect sense, regardless of whether the game is multi-lingual or not. So going multi-lingual will not make your current solution for keyboard shortcuts stop working. Someone who has released an online indie game and has access to geographical information about the players as well as other information such as which languages players use to name themselves and chat would be able to answer you better. Anyway, you will probably find this to be helpful.

looking at link:
yeah, I generally use UTF-8 for the most part as well.
for the vast majority of text IME, UTF-8 either makes it smaller or is break-even.


in my case, some things (notably my console buffer) use fixed 16-bit characters (and also another 16-bits to indicate character-cell colors and effects). otherwise, it is laid out similarly to the text-mode framebuffer.

eg, effects: EE.B.F, EE=effects flags (blink, underline, italic, bold, strikeout, ...), B=background color, F=foreground color.

most of these effects (apart from bold) are done as tricks with the character cells, IOW: italic by moving the character's vertices, underline and strikeout by drawing a hyphen/underscore over the top, blink by alternating whether or not the character is drawn. the colors use the usual 16-color CGA palette.

note that printing text to the console still happens as normal UTF-8 text, just using embedded ANSI codes for colors and effects and similar.


currently though, there are no internationalized/translated text strings, but alas...

#7 wintertime   Members   -  Reputation: 532

Like
1Likes
Like

Posted 09 March 2013 - 03:05 PM

Seriously, if the game allows input of any text string and not just single keys as keyboard shortcuts or shows text in any other language then American English, then ASCII does not cut it anymore. It would be as backwards as showing all graphics in CGA colored black+white+magenta+cyan.

Even just Latin1 would be like EGA graphics aka the absolute tolerable minimum for European people.

But UTF8 isnt that much more difficult to have, you just need a font. Though UTF32 could be easier if you dont want to fiddle with variable length encoding and compared to all those huge graphics nowadays its not that much more needed space. Then you have the whole world covered at once and can reuse your implementation forever.



#8 Sik_the_hedgehog   Members   -  Reputation: 957

Like
0Likes
Like

Posted 09 March 2013 - 03:46 PM

Replying to the OP, by general rule: no, you can't get away with just ASCII. You will most likely need Unicode if you pretend the use of anything other than English, and if the user can provide its own text (including not just stuff like his username and such but also filenames, this matters a lot), you will pretty much need Unicode no matter what. If keyboard input is allowed you'll also need to implement IME support (or whatever is the equivalent in the system).

 

There's one place where you may get away with ASCII: your own game's filenames (i.e. your own data). Since you have full control over those files you can do whatever you want with them. Also stuff like identifiers (in many places they're usually just alphanumeric characters and underscores, i.e. a small set of ASCII). Basically stuff that's internal to the game and the user doesn't deal with at all. Only bother with this if in the long term it's simpler, though.

 

EDIT: also UTF-32 isn't 100% fixed length. Yes, all codepoints are exactly 4 bytes long, but not all characters consist of a single codepoint (composition and such gets in the way). In that sense it doesn't really provide much of an advantage to UTF-8 (other than still being simpler to handle), and it needs more memory in comparison. May want to take this into account if you ever pretend to handle Unicode characters in a non-linear way (i.e. access the middle of the string and such).


Edited by Sik_the_hedgehog, 09 March 2013 - 03:48 PM.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#9 cr88192   Members   -  Reputation: 636

Like
-1Likes
Like

Posted 09 March 2013 - 06:27 PM

Seriously, if the game allows input of any text string and not just single keys as keyboard shortcuts or shows text in any other language then American English, then ASCII does not cut it anymore. It would be as backwards as showing all graphics in CGA colored black+white+magenta+cyan.
Even just Latin1 would be like EGA graphics aka the absolute tolerable minimum for European people.
But UTF8 isnt that much more difficult to have, you just need a font. Though UTF32 could be easier if you dont want to fiddle with variable length encoding and compared to all those huge graphics nowadays its not that much more needed space. Then you have the whole world covered at once and can reuse your implementation forever.

partial disagreement:
UTF-32 doesn't offer all that much above UTF-16, but on average takes 2x as much space. better probably in general to just use UTF-16 for these cases (and UTF-8 for general strings storage).


one saving point is if one already has formatting information, with UTF-32 is they can stick the codepoint and formatting flags in a single value (vs needing a separate codepoint and formatting word).

say: low 21 bits=codepoint, high 11 bits=formatting.

for example:
21 bits: Codepoint
3 bits: Background
4 bits: Foreground
4 bits: Mode (*1)

*1: Uses a table of effect-modes, possibly omitting some uncommon combinations of formatting flags.
for example:
0=Normal, 1=Bold, 2=Italic, 3=Underline,
4=Strikeout, 5=Subscript, 6=Superscript, 7=Blink,
8=Bold+Italic, 9=Bold+Underline, 10=Bold+Stikeout, 11=Italic+Underline,
12=Italic+Stikeout, 13=Bold+Italic+Underline, 14=Bold+Italic+Strikeout,
15=Escape (steals 9 bits from codepoint, for more formatting flags)

Edited by cr88192, 11 March 2013 - 12:37 PM.


#10 wintertime   Members   -  Reputation: 532

Like
1Likes
Like

Posted 10 March 2013 - 12:43 PM

I intentionally did not mention UTF16 because it combines the biggest disadvantages of UTF8 and UTF32. It is a hack onto UCS2 with variable length and got the ambiguous endian problem and often also bugged by people still assuming there was only the BMP so IMHO its the most difficult type of unicode to support, thats why it would probably be best to not use it for new things. Nowadays I feel the trend is for optimizing for less programming time, where I feel UTF32 shines for programming new string manipulation algorithms, though UTF8 has the advantage of reusing some ASCII routines.

UTF16 also possibly also uses more memory than UTF8(2 bytes instead of 1, 4 bytes instead of 3), which means if thats your concern you would not choose UTF16 (which is not always using just 2 bytes as you assume) over UTF32 but UTF8. If you are really concerned about storage space you could just pipe it through zlib and then all 3 are mostly equivalent. Did you see the linked page?



#11 cr88192   Members   -  Reputation: 636

Like
0Likes
Like

Posted 10 March 2013 - 03:37 PM

wintertime, on 10 Mar 2013 - 13:48, said:
I intentionally did not mention UTF16 because it combines the biggest disadvantages of UTF8 and UTF32. It is a hack onto UCS2 with variable length and got the ambiguous endian problem and often also bugged by people still assuming there was only the BMP so IMHO its the most difficult type of unicode to support, thats why it would probably be best to not use it for new things. Nowadays I feel the trend is for optimizing for less programming time, where I feel UTF32 shines for programming new string manipulation algorithms, though UTF8 has the advantage of reusing some ASCII routines.
UTF16 also possibly also uses more memory than UTF8(2 bytes instead of 1, 4 bytes instead of 3), which means if thats your concern you would not choose UTF16 (which is not always using just 2 bytes as you assume) over UTF32 but UTF8. If you are really concerned about storage space you could just pipe it through zlib and then all 3 are mostly equivalent. Did you see the linked page?

I would not choose UTF-16 over UTF-8 for most cases. UTF-8 is probably best for general string storage.

note previous posting: I mentioned using UTF-8 for general string storage. (EDIT: for some things I use UTF-16 though, and currently I almost never use UTF-32...).

but, I might choose UTF-16 over UTF-32 for many other cases though. if a 1:1 character/codepoint mapping is needed (such as in a console buffer), it may well make more sense to simply ignore non-BMP characters (replacing them with a filler character), than to have to use 2x as much memory. as-noted, the saving point is if one already needs formatting info (and would be using at-least 32 bits anyways, and just needs to give up a few bits of formatting data).

for string-storage, the cases where UTF-16 will need 4 bytes but UTF-8 will need 3: this can't actually happen (both UTF-16 and UTF-8 will require 4 bytes in this range).

actually, if using a CESU-8 or M-UTF-8 variant, UTF-8 will require 6 bytes in this range (the character will be encoded using surrogate pairs which are themselves encoded as UTF-8, so 3 bytes for each half of the pair). (a lot of my own code uses M-UTF-8).


related to a prior post:
I have observed that at 64x64 pixels and stored as RGBA, the Unicode BMP is too large to effectively fit inside a 32-bit process (although DXT1 makes more sense here).

more reasonable would probably be only having the ASCII range or similar available at 64x64, and most everything else as 16x16 (as before), or maybe parts available at 32x32.
was mostly fiddling around with specialized algos for upsampling bitmap fonts (so, the 32x32 or 64x64 versions would be upsampled from the 16x16 versions). (note: bilinear and bicubic upsampling do a poor job with text). note: for the ASCII range there is also a dedicated 8x8 font. (note: my current font-code handles different sizes independently, so apart from drawing high-res text, no high-res characters will be rendered).

Edited by cr88192, 10 March 2013 - 03:42 PM.


#12 Ashaman73   Members   -  Reputation: 4606

Like
0Likes
Like

Posted 11 March 2013 - 04:38 AM

So if I release a game in which the user only has access to ASCII characters will this be a problem?

 

I haven't played many newer multiplayer games so I'm not sure what charsets they support.

 

 

For example do chinese gamers who play "English made" indie games play them in English or are these ignored on their market?

 

 

In my game the user will have to set keyboard shortcuts but so far this can only be done with ASCII characters.

Besides technically issues (save games etc.), here are some points:

 

1. Why don't you try it first with english only ? Even Ascii ?

When encapsulated correctly, I would sugguest to go with english/ascii first until your game is done. After that, I would expand if necessary.

 

2. Have you done any market analysis ?

Check, if your game can be sold in the specific country and if it is possible to even sell anything at all(very hi piracy rate).

 

3. You need to translate all your texts or even audio into the target language.

 

If considering ascii/language is no problem for you, do it. But if it is a hassle at the moment, finish your game first and do not print an expensive manual of an unfinished game upfront.


Edited by Ashaman73, 11 March 2013 - 04:39 AM.


#13 Olof Hedman   Members   -  Reputation: 1229

Like
0Likes
Like

Posted 11 March 2013 - 05:04 AM

Besides technically issues (save games etc.), here are some points:
 
1. Why don't you try it first with english only ? Even Ascii ?
When encapsulated correctly, I would sugguest to go with english/ascii first until your game is done. After that, I would expand if necessary.
 
2. Have you done any market analysis ?
Check, if your game can be sold in the specific country and if it is possible to even sell anything at all(very hi piracy rate).
 
3. You need to translate all your texts or even audio into the target language.
 
If considering ascii/language is no problem for you, do it. But if it is a hassle at the moment, finish your game first and do not print an expensive manual of an unfinished game upfront.

 

It's not that much effort to support some form of unicode, and better do it right from the start.

If you want TT support, it's pretty easy to do a basic freetype integration for example.

 

Translation isn't really the main issue, many markets do not mind playing games in english, a lot of swedes even dislike translations, but we get cranky if we can't use our funky characters to name stuff in the game, and chat with our friends.

 

Not supporting a proper character encoding makes the game look really unprofessional in most european markets.


Edited by Olof Hedman, 11 March 2013 - 05:26 AM.


#14 wintertime   Members   -  Reputation: 532

Like
1Likes
Like

Posted 11 March 2013 - 01:10 PM

related to a prior post:
I have observed that at 64x64 pixels and stored as RGBA, the Unicode BMP is too large to effectively fit inside a 32-bit process (although DXT1 makes more sense here).

Sorry, this point doesnt seem to add up to me.

On one side you feel the need of storing high quality glyphs as 32bit RGBA(just for those near invisible colored edges to make use of pixel ordering on LCD?) when one possibly could get away with something like 4*2bit, 8bit or even 1bit, on the other side you want to conserve(probably much less) bytes by cutting complete character ranges beyond 0xffff? I would guess Asian people would be more happy with low quality glyphs in a game than with only having "half" the characters they need.

Maybe you can just load more language ranges after first use? Also there are huge empty or just reserved or private use ranges in unicode so it should be much less than 0x10ffff anyway.



#15 cr88192   Members   -  Reputation: 636

Like
-1Likes
Like

Posted 11 March 2013 - 03:28 PM

wintertime, on 11 Mar 2013 - 14:15, said:

cr88192, on 10 Mar 2013 - 16:42, said:
related to a prior post:
I have observed that at 64x64 pixels and stored as RGBA, the Unicode BMP is too large to effectively fit inside a 32-bit process (although DXT1 makes more sense here).

Sorry, this point doesnt seem to add up to me.
On one side you feel the need of storing high quality glyphs as 32bit RGBA(just for those near invisible colored edges to make use of pixel ordering on LCD?) when one possibly could get away with something like 4*2bit, 8bit or even 1bit, on the other side you want to conserve(probably much less) bytes by cutting complete character ranges beyond 0xffff? I would guess Asian people would be more happy with low quality glyphs in a game than with only having "half" the characters they need.
Maybe you can just load more language ranges after first use? Also there are huge empty or just reserved or private use ranges in unicode so it should be much less than 0x10ffff anyway.


this observation was mostly made with my font-processing tools, which were crashing due to trying to malloc too much data, and failing.
these tools basically just naively malloc image buffers for the entire character space, for sake of processing. 32px was the effective upper limit for having everything fit in the process (and not crashing the tool).

this wouldn't much effect the engine, apart from if using CJK characters and ending up pulling in a large part of the character space (only accessed parts of the font-space are converted). in-engine, RGBA exists as an intermediate stage, mostly prior to converting into DXT1 (for upload to the GPU).


as for strings, the issue is that mostly things like strings and similar take up a fair amount of heap-space in my engine (although, granted, not nearly as much as voxel terrain, vertex arrays, ..., which as-is are the majority of the memory use). (~ 1GB is typically used for voxels and VAs and similar).

as is, it would be a difference mostly of around ~ 600MB for UTF-32, vs ~ 150MB for UTF-8 (ASCII-range is by far dominant). (EDIT: most of this is internal text/data, relatively little end-user directed text). for most things, it makes sense to stick with UTF-8.

(my engines' MM is able to dump how much of what types of memory allocations are made).


the main thing which would be effected by UTF-32 (assuming nearly everything else remaining UTF-8) would be the console buffers, which would go from around 500kB to 1MB, but granted, this isn't really a huge issue (that or reworking how effects work). probably it would also effect the in-console text-editor, which is basically partly integrated with the console. (EDIT/ADD: consoles store a buffer for a 1024x768 window, which works out to 128x96 chars with an 8px char, using 2 words for each character, and with 10 consoles, or 491kB vs 983kB).

most of the code in-engine works directly between UTF-8 formatted strings, with a few edge-cases where UTF-16 is used.
most of the conversion code knows about surrogate pairs and other things, though M-UTF-8 is typically the "canonical" storage, partly due to JVM influence (and, like Java and ECMAScript, my scripting language uses UTF-16 as its canonical string format, though M-UTF-8 is often used internally). (actual heap usage due to UTF-16 strings is fairly insignificant, given how infrequently they are used at present).

(EDIT/ADD: a cheap/lazy solution found for console: an effect-flag now indicates that the background-color field encodes 4 more character bits (with the background color coming from prior character in this case), subscript/superscript now uses a single bit, which if set uses strikeout to indicate which it is... this allows for effectively 20 bit characters).

or such...

Edited by cr88192, 12 March 2013 - 12:36 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS