How to read a unicode text file and other questions
My system is Windows 2000, the notepad can save the text file in UTF-8 and Unicode, I use InputStreamReader(InputStrean is, String encoding) to read the file.
if I use "UTF-8" for encoding, it works, but if I use "Unicode", a UnsupportedEncodingException occurred. So how to read a unicde text file?
and also, how to randomly read the content of the text file?
UTF-8 is unicode (i.e. it is a valid encoding for Unicode data). By "Unicode" you probably mean one of the other encodings... possibilities include:
AFAIK, to Notepad "Unicode" means UTF-16LE.
As for "randomly" reading, you can skip() ahead a certain distance in a stream, and depending on the stream type, you may be able to mark() the current point and reset() to go back to that point in the file. However, AFAIK most phone implementations load the whole file into memory when you open it, anyway. If you're really wondering about choosing random points to go to in the file (as opposed to just what is usually meant by "random access"), look up java.util.Random.
US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character setISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1UTF-8 Eight-bit UCS Transformation FormatUTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte orderUTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte orderUTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
AFAIK, to Notepad "Unicode" means UTF-16LE.
As for "randomly" reading, you can skip() ahead a certain distance in a stream, and depending on the stream type, you may be able to mark() the current point and reset() to go back to that point in the file. However, AFAIK most phone implementations load the whole file into memory when you open it, anyway. If you're really wondering about choosing random points to go to in the file (as opposed to just what is usually meant by "random access"), look up java.util.Random.
UTF-8 is probably the encoding I'd recommend for a J2ME app; J2ME apps need to be as small as possible, and by encoding (mostly ascii) characters in UTF-16, you're going to make the file twice the size.
EDIT: but if your strings are all (or mostly) in Chinese, I don't know what difference it will make.
Mark
EDIT: but if your strings are all (or mostly) in Chinese, I don't know what difference it will make.
Mark
I found a way to solve the problem:
i store chinese text in txt files, in Unicode, a Unicode char has 2 bytes size, right?
InputStream is = getClass().getResourceAsStream(chapterPath[chapter]);
byte [] words = new byte [is.available()];
is.read(words);
is.close();
char [] c = new char [words.length >> 1];
int i = 0;
for (int j = 0;j < words.length;) {
int k = words[j++]; //there's a byte to int convert here
if (k < 0)
k += 256;
int l = words[j++];
if (l < 0)
l += 256;
c[i++] = (char)(k + (l << 8));
}
after this, char array c has the content of the file, of course you can use StringBuffer, but I don't know which cost more memory.
i store chinese text in txt files, in Unicode, a Unicode char has 2 bytes size, right?
InputStream is = getClass().getResourceAsStream(chapterPath[chapter]);
byte [] words = new byte [is.available()];
is.read(words);
is.close();
char [] c = new char [words.length >> 1];
int i = 0;
for (int j = 0;j < words.length;) {
int k = words[j++]; //there's a byte to int convert here
if (k < 0)
k += 256;
int l = words[j++];
if (l < 0)
l += 256;
c[i++] = (char)(k + (l << 8));
}
after this, char array c has the content of the file, of course you can use StringBuffer, but I don't know which cost more memory.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement