Parsing String sometimes doesn't work.

Started by
1 comment, last by Belgium 12 years, 1 month ago
[font=courier new,courier,monospace]Hello,

I am using the following to parse a String which has spaces as a delimiter.

List<String> elements = Arrays.asList(text.trim().split("\\s+"));

I've been using that line a lot for String parsing and have never had a problem until now.

It works as I desire for the String " DEVICE: CPM PROC UNIT: 3 PAIR: B ".
The result is "DEVICE:", "CPM", "PROC", etc.

It does not work for this String " DEVICE: SPAN UNIT: 2370 ".
The results are "DEVICE:", "", "SPAN". There's an empty string where I expect to see "SPAN".

This also does not work for this string, with similar results, " DEVICE: ALMCRD UNIT: 56"

I've also tried String temp[] = text.trim().split("[ ]+"). Same thing.

I'm lost. I don't know what I am missing here.[/font]

[font=courier new,courier,monospace]Edit: My original post ate a number of spaces in my examples between "DEVICE: " and the next element. There are more than 1 spaces between this texts. [/font][font=courier new,courier,monospace]Let's see if changing the font helps.[/font]

[font=courier new,courier,monospace]Edit2: Loading the source file for the text in a hex editor, I see there is a 0x00 character in the place of one of the spaces. So, the three "spaces" between "DEVICE:" and "SPAN" is actually 0x20 0x00 0x20. That 0x00 character doesn't show up in the Strings that parse correctly. Isn't 0x00 considered whitespace and shouldn't "\\s+" include it? [/font]
Advertisement
The null character is not considered whitespace in Java. You'll either have to adjust your regex to "[\0\\s]+", or clean up your input data files (if you have that option).

The null character is not considered whitespace in Java. You'll either have to adjust your regex to "[\0\\s]+", or clean up your input data files (if you have that option).


Thanks! I changed it to "[\\s\\x00]+" and that seems to work.

This topic is closed to new replies.

Advertisement