Jump to content
  • Advertisement
Sign in to follow this  
dabo

Sqlite strange characters

This topic is 3881 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I've built a sqlite database for my game but when I try to read strings containing characters like å, ä, ö, ü etc. they become messsed up. An example: Altmünster --> Altmünster Böheimkirchen --> Böheimkirchen After having a closer look at the database file (using notepad) I saw that the strange characters were in the database itself. So my question is, how can I get rid of the strange characters and get the correct once instead? The editor I'm using (SQLite Database Browser) displays the correct characters even though they are not present in the database file so I figure I can read them to my game the same way the editor does. I guess it has something to do with text encoding which I don't know much about...

Share this post


Link to post
Share on other sites
Advertisement
Every string you manipulate has a certain encoding. Typical encodings are latin-1 and utf-8. When a string is meant to be utf-8, but is used as if it were latin-1, then errors similar to the one you mention appear.

In short, you have to do one of three things:

  1. Set up your database to store strings as utf-8 instead of latin-1.

  2. Re-encode your string from utf-8 to latin-1 (but you'll risk losing some characters which don't exist in latin-1) using the encoding manipulation functions of your language.

  3. Escape utf-8 characters so that they can be represented as plain latin-1, then unescape them when retrieving them from the database.

Share this post


Link to post
Share on other sites
Thanks for the quick reply.

Quote:
1. Set up your database to store strings as utf-8 instead of latin-1.
How do I do this? The editor I'm using won't let me (as far I know) choose anything like that when I create the database.

Quote:
2. Re-encode your string from utf-8 to latin-1 (but you'll risk losing some characters which don't exist in latin-1) using the encoding manipulation functions of your language.
I'm using c++, what is the name of that function? and is it expensive doing it this way?

Quote:
3. Escape utf-8 characters so that they can be represented as plain latin-1, then unescape them when retrieving them from the database.
I don't understand what you mean.


Sorry for being a pain in the a** ;)

Btw, in the editor I'm using you can choose to export the database as a sql-file, checking it out with notepad you see that all the correct characters are present, but in the sqlite-file it is not. Does sqlite and sql use different text encodings?

Share this post


Link to post
Share on other sites
Firstly, make sure you are using the latest version of SQLite, which is version 3. Version 2 appears to require that you explicitly build the library in UTF-8 or iso8859 format, which makes life harder.

It appears that you are feeding in ISO-8859 strings, which are then being converted to UTF-8. UTF-8 is a Unicode encoding, which is basically a way of storing characters that can handle not just English, Swedish, and other Western European characters, but characters from all over the world. To do this however, it must make some sacrifices, and one of those is that some of your Swedish characters are now represented by 2 bytes rather than 1.

It may be possible to find a setting that forces your strings to be stored in ISO-8859 format - if so, that is the easiest way to get what you want. Otherwise, you have to look at encoding UTF-8 as ISO-8859, which involves being able to translate some characters and discard others, and which may require an external library like iconv.

Share this post


Link to post
Share on other sites
I'm gonna google and see if I can find something about forcing strings to be stored in the ISO-8859 format, I'm using sqlite3 btw.

I'm also gonna check out iconv, the only thing I'm worried about is how expensive it is to have to convert everything, there is plenty of text.

Share this post


Link to post
Share on other sites
Quote:
the only thing I'm worried about is how expensive it is to have to convert everything, there is plenty of text


How many queries per second do you perform? How many rows in the table? If under 1000,1 million respectively, performance shouldn't be a concern.

Also, such conversions often do not impact the actual storage and algorithms, they merely serve as a flag on how to display the data. Under one encoding, ASCII character 0x4e is displayed as u, and under other as ü.

Share this post


Link to post
Share on other sites
Well performance shouldn't be a problem then. Solving the ugly characters still is though. I found iconv but I don't know how to use it, I can't find any samples only the manual that doesn't help much.

Share this post


Link to post
Share on other sites
Dabo, forget iconv until you have seen if it's possible to have your database store in ISO-8859 format. If so, that is the correct thing to do. Since sqlite says the following, I think the problem might be in the database editor you use. "SQLite is not particular about the text it receives and is more than happy to process text strings that are not normalized or even well-formed UTF-8 or UTF-16. Thus, programmers who want to store IS08859 data can do so using the UTF-8 interfaces. As long as no attempts are made to use a UTF-16 collating sequence or SQL function, the byte sequence of the text will not be modified in any way."

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!