java.lang.String split

Started by
3 comments, last by BitMaster 11 years ago

I just recently found out that Java's String split method works in (at least for me) unexpected ways:


"hello".split(",").length  // -> 1
",".split(",").length      // -> 0
",,".split(",").length     // -> 0
" ,,".split(",").length    // -> 1
"".split(",").length       // -> 1

Is it just me that thinks this is weird?

  • Did they forget about empty strings?
  • Isn't this a possible source of bugs -- I'm thinking CSV, etc.
  • Isn't "split" breaking SRP, meaning, it doesn't just split, it also checks if the result is non-empty

Is there some possible explanation why they decided to do this? Mostly out of curiousity.

Advertisement
The core explanation of the documentation says

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

"hello" and "" do not match the regular expression, so the input string is returned. I haven't invested much time into that but from a cursory glance, the output does not seem surprising.

There's an optional second parameter, try this

"xyz".split(",",-1)

Thanks, -1 is making it do what I expected it to do. I guess my point was that the chosen design for the method is error-prone, but apparently it's the same in Ruby (not that Ruby is the all-mighty Reasonable Behavior), but not Python. Hmm.

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

So in the case of ",".split(","), we have one matching substring, ",". And there are two substrings that are either terminated by a matching substring or end of the string, "" and "". So, something is false here? Or am I understanding it wrong? Where does it say that a substring can't have a length of 0?

So in the case of ",".split(","), we have one matching substring, ",". And there are two substrings that are either terminated by a matching substring or end of the string, "" and "". So, something is false here? Or am I understanding it wrong? Where does it say that a substring can't have a length of 0?

Further along in the documentation it states

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

";" would be split into "" and "", but both are trailing empty strings and discarded before the method returns.

This topic is closed to new replies.

Advertisement