Sign in to follow this  
patrrr

java.lang.String split

Recommended Posts

I just recently found out that Java's String split method works in (at least for me) unexpected ways:

 

"hello".split(",").length  // -> 1
",".split(",").length      // -> 0
",,".split(",").length     // -> 0
" ,,".split(",").length    // -> 1
"".split(",").length       // -> 1

 

Is it just me that thinks this is weird?

  • Did they forget about empty strings?
  • Isn't this a possible source of bugs -- I'm thinking CSV, etc.
  • Isn't "split" breaking SRP, meaning, it doesn't just split, it also checks if the result is non-empty

Is there some possible explanation why they decided to do this? Mostly out of curiousity.

Edited by patrrr

Share this post


Link to post
Share on other sites
The core explanation of the documentation says

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

"hello" and "" do not match the regular expression, so the input string is returned. I haven't invested much time into that but from a cursory glance, the output does not seem surprising.

Share this post


Link to post
Share on other sites

Thanks, -1 is making it do what I expected it to do. I guess my point was that the chosen design for the method is error-prone, but apparently it's the same in Ruby (not that Ruby is the all-mighty Reasonable Behavior), but not Python. Hmm.

 

The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

 

So in the case of ",".split(","), we have one matching substring, ",". And there are two substrings that are either terminated by a matching substring or end of the string, "" and "". So, something is false here? Or am I understanding it wrong? Where does it say that a substring can't have a length of 0?

Edited by patrrr

Share this post


Link to post
Share on other sites

So in the case of ",".split(","), we have one matching substring, ",". And there are two substrings that are either terminated by a matching substring or end of the string, "" and "". So, something is false here? Or am I understanding it wrong? Where does it say that a substring can't have a length of 0?

Further along in the documentation it states

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

";" would be split into "" and "", but both are trailing empty strings and discarded before the method returns. Edited by BitMaster

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this